Parallel deconvolution problem fixed (bde27029) · Commits · Claudio Gheller / HPC_Imaging

README.md

+80 −20

Original line number	Diff line number	Diff line
		@@ -87,18 +87,54 @@ real-part peak equals 1.0 (Hogbom convention) and replicated on every MPI rank
		for the distributed CLEAN subtraction. It is also written to
		`psf_real.bin` / `psf_imag.bin` as a diagnostic product.

		### Restored image (NOT YET IMPLEMENTED — future work)
		### Image Restoration & Restoring Beam

		The photometrically correct restored image is
		RICK implements WSClean-style image restoration: the clean-component map is
		convolved with an elliptical Gaussian restoring beam fitted to the PSF main
		lobe, then the residual is added.

		$$I_{\text{restored}}(l,m) = \sum_k \gamma r^{(k)}_{\max} \delta(l - l_k, m - m_k) \ast G_{\text{clean}} + r^{(N_{\text{iter}})}(l,m)$$
		2D elliptical Gaussian restoring beam:

		where $G_{\text{clean}}$ is the restoring Gaussian beam (fitted to the PSF
		main lobe). This convolution step is not yet implemented in RICK: the
		current `restored_.bin` output is `component + residual` without* the
		$G_{\text{clean}}$ convolution and is a diagnostic product only. The
		restoring-beam fit and convolution **must be added before any photometric
		measurement** is made on the restored image (see Section 5.1).
		$$G(x, y) = \exp\!\left(-\frac{1}{2}\left[\frac{(x\cos\phi + y\sin\phi)^2}{\sigma_{\rm maj}^2} + \frac{(-x\sin\phi + y\cos\phi)^2}{\sigma_{\rm min}^2}\right]\right)$$

		where $\sigma = \theta_{\rm FWHM}/(2\sqrt{2\ln 2})$ and $\phi$ is the beam
		position angle (east of north).

		Restored image (WSClean convention):

		$$I_{\rm restored}(l,m) = \underbrace{\sum_k \gamma \, r^{(k)}_{\rm max}\,\delta(l-l_k,\,m-m_k)}_{\rm clean\ components} \circledast\, G_{\rm clean} \;+\; r^{(N_{\rm iter})}(l,m)$$

		Beam fitting (log-parabola method). For an elliptical Gaussian,
		$\ln{\rm PSF}$ is an exact quadratic form, so the main lobe (pixels above
		10% of the unit peak) is fitted by linear least squares to

		$$\ln {\rm PSF}(\Delta x, \Delta y) \approx c_0 + c_1\Delta x + c_2\Delta y + c_3\Delta x^2 + c_4\Delta x\Delta y + c_5\Delta y^2$$

		The quadratic-form matrix $A = \bigl[\begin{smallmatrix}-2c_3 & -c_4\\ -c_4 & -2c_5\end{smallmatrix}\bigr]$
		is eigen-decomposed: the eigenvalues are the inverse squared $1\sigma$ widths
		along the principal axes ($\theta_{\rm FWHM} = 2\sqrt{2\ln 2}/\sqrt{\lambda}$,
		major axis from the smallest eigenvalue) and the eigenvectors give the
		position angle. This is the same approach WSClean uses internally; it needs
		no non-linear iteration and no external linear-algebra library (6×6 normal
		equations, Gaussian elimination). If the fit is degenerate (non-positive-
		definite form, absurd axis lengths, < 12 main-lobe pixels) a conservative
		circular fallback of FWHM $2\sqrt{2\ln 2} \approx 2.35$ pixels is used.
		User override: set `deconvolution.restoring_beam.{bmaj,bmin,bpa}` to numeric
		values (arcsec / degrees) instead of `auto`.

		Unit alignment. The kernel is normalised to unit peak
		($G(0,0) = 1$), the WSClean/CASA restoration convention:

		$$\underbrace{[{\rm Jy/pixel}]}_{\rm components} \circledast \underbrace{[{\rm dimensionless,\ unit\ peak}]}_{\rm Gaussian\ kernel} + \underbrace{[{\rm Jy/beam}]}_{\rm residual} = \underbrace{[{\rm Jy/beam}]}_{\rm restored}$$

		A delta-function component of flux $A$ Jy convolved with the unit-peak kernel
		produces a Gaussian of peak $A$ Jy/beam at the component location — the
		same flux scale as the residual (whose dirty beam is also peak-normalised to
		1.0), so the two terms add consistently. Integrated source flux is recovered
		as $\sum I_{\rm restored} / \Omega_{\rm beam}$ with the beam area
		$\Omega_{\rm beam} = \pi\,\theta_{\rm maj}\theta_{\rm min}/(4\ln 2)$ pixels.
		(A unit-sum kernel would instead restore a component to a peak of
		$A/(2\pi\sigma_{\rm maj}\sigma_{\rm min})$, breaking the Jy/beam scale.)

		### Configuration parameter ↔ symbol mapping

		@@ -110,6 +146,9 @@ measurement** is made on the restored image (see Section 5.1).
		\| `deconvolution.gain` \| $\gamma$ \| Hogbom loop gain \|
		\| `deconvolution.threshold` \| $r_{\text{thresh}}$ \| CLEAN convergence threshold \|
		\| `deconvolution.max_iterations` \| $N_{\text{iter}}$ \| Maximum CLEAN minor cycles \|
		\| `deconvolution.restoring_beam.bmaj` \| $\theta_{\rm maj}$ \| Restoring beam major axis FWHM (arcsec); `auto` = fit from PSF \|
		\| `deconvolution.restoring_beam.bmin` \| $\theta_{\rm min}$ \| Restoring beam minor axis FWHM (arcsec); `auto` = fit from PSF \|
		\| `deconvolution.restoring_beam.bpa` \| $\phi_{\rm PA}$ \| Restoring beam position angle (deg E of N); `auto` = fit from PSF \|
		\| `gridding.robust` \| $R$ \| Briggs robust weighting parameter \|
		\| `gridding.kernel_oversampling` \| $\beta$ \| Kaiser-Bessel oversampling factor \|

		@@ -512,6 +551,9 @@ These keys live under `files:` and are read by `config_parser.cpp` exactly as li
		\| `deconvolution.threshold` \| float \| `0.001` \| `0.0005`\| Stop when peak residual drops below this value [Jy/beam]. \|
		\| `deconvolution.gain` \| float \| `0.1` \| `0.1` \| Loop gain: fraction of peak flux removed per iteration (0.05–0.2). \|
		\| `deconvolution.positive_only` \| bool \| `false` \| `false` \| Restrict cleaning to positive components (opt-in; keep false for negative sidelobes and Stokes Q/U/V). \|
		\| `deconvolution.restoring_beam.bmaj`\| float/`auto` \| `auto` \| `8.5` \| Restoring beam major-axis FWHM [arcsec]; `auto` (or absent) = fit an elliptical Gaussian to the PSF main lobe. \|
		\| `deconvolution.restoring_beam.bmin`\| float/`auto` \| `auto` \| `7.0` \| Restoring beam minor-axis FWHM [arcsec]; `auto` = fit from PSF. Override requires BOTH bmaj and bmin numeric. \|
		\| `deconvolution.restoring_beam.bpa` \| float/`auto` \| `auto` \| `25.0`\| Restoring beam position angle [deg, east of north]; only consulted with a numeric bmaj/bmin override. \|

		> PSF is generated automatically. The dirty beam is computed internally
		> from the visibility data (second gridding pass with unit amplitudes and the
		@@ -625,21 +667,39 @@ The loop terminates when $A_k < \tau$ (threshold) or $k = N_{\text{max}}$ (maxim

		#### Image Restoration

		The true restored image is:
		The restored image is:
		$$
		I_{\text{restored}} = (C * B_{\text{clean}}) + R_{\text{final}}
		$$
		where $B_{\text{clean}}$ is a Gaussian restoring beam fitted to the main lobe of the PSF.
		where $B_{\text{clean}}$ is the elliptical Gaussian restoring beam fitted to
		the main lobe of the PSF (or supplied via
		`deconvolution.restoring_beam.{bmaj,bmin,bpa}`).

		RICK implements this in full (see Image Restoration & Restoring Beam
		in the Mathematical Formulation section): after the CLEAN loop terminates,
		each rank fits the beam on the replicated PSF (deterministic — no
		communication), builds a unit-peak Gaussian kernel truncated at $3\sigma$ of
		the major axis, convolves the sparse component map with it (scatter-style,
		$O(N_{\rm comp} K^2)$), and adds the residual. The fitted beam parameters
		are printed at rank 0:

		RICK's current implementation provides:
		$$
		I_{\text{restored}} = C + R_{\text{final}}
		$$
		This is a diagnostic output only. For photometrically valid results, users must:
		1. Export `component_real.bin` and `residual_real.bin`
		2. Fit a 2D Gaussian to the PSF main lobe
		3. Convolve the component map with that Gaussian
		4. Add the residual
		```
		[RICK INFO] Restoring beam: BMAJ=8.412" (3.21 px), BMIN=7.105" (2.71 px), BPA=23.4 deg (fitted from PSF main lobe)
		```

		In distributed runs the restoration performs a component halo exchange
		before the convolution: each rank exchanges a border strip of exactly one
		kernel radius $R$ of its component map with its face neighbours
		(`MPI_Sendrecv`, `MPI_PROC_NULL` at the global image edge), so the Gaussian
		wings of components within $R$ of a slab boundary are scattered into the
		neighbouring slab instead of being clipped. The exchange is two-stage
		(rows first, then extended column strips that include the received
		row halos), which also propagates the corner blocks from diagonal
		neighbours — the restored image is bitwise independent of the
		decomposition layout up to floating-point summation order. To keep the
		kernel and the halo message sizes identical on every rank, the kernel
		radius safety clamp uses the global minimum slab dimension
		(`MPI_Allreduce` MIN) rather than the local slab width.

		### 5.2 MPI-Distributed Implementation

agents/claude_agents/agent-memory/cpp-hpc-reviewer/rick_codebase.md

+0 −5

Original line number	Diff line number	Diff line
		@@ -9,7 +9,6 @@ RICK is a distributed radio interferometry imaging code (C++/MPI+OpenMP) doing W
		Decomposition model:
		- 1D mode: full X (x_start=0, xaxis=grid_size_x), Y strip per rank (Gaussian-weighted strip sizes possible).
		- 2D mode: sub-rectangle of both axes; requires ≥4 ranks; "2d" or "2d_gaussian" modes via config.
		- "relaxed" mode (added 2026): data-driven point-based UV decomposition. Builds per-axis histograms over the normalised [0,1] coordinate domain (NOT the data-extent envelope — this would silently rescale the data range to span the full pixel grid and put slab boundaries in the wrong pixels), then iteratively refines slab boundaries via pair-optimal V-shape binary search. Histogram-cell slab boundaries are mapped onto pixel boundaries via `pixel = floor(slab/Ngrid * grid_size)`. Output `(start_x, start_y, size_x, size_y, rank_2d_x, rank_2d_y, npx, npy)` matches the 2D contract so the rest of the pipeline (bucket-sort, gridding, wstack, FFT, phase, MPI-IO subarray) is unchanged. Requires ≥4 ranks. Conceptually deferred: relaxed_decomposition runs AFTER reading uu/vv local coordinates because it's data-dependent; the 1D/2D paths run up-front.
		- Visibilities distributed via ghost-region bucket sort: any vis whose kernel footprint (±KernelLen cells) overlaps a rank's domain is sent to that rank. Each rank then clips kernel to its local domain in `wstack.cpp`.

		Why: Allows each rank to grid disjoint pieces of overlapping kernel footprints — no double-counting and no halo exchange needed. The grid is output-decomposed, not input-decomposed.
		@@ -22,13 +21,9 @@ RICK is a distributed radio interferometry imaging code (C++/MPI+OpenMP) doing W
		- Halo-exchange utilities (`exchange_halos_x/y/2d` in `src/communication/communication.cpp`) have stride bugs (write at `grid + y*xaxis + size_x` which assumes a halo-extended row that isn't allocated). Treat these as broken — use the ghost-region clipping design instead.
		- MPI-IO write in `phase_correction` uses `starts[2] = {y_start, 0}` — correct for 1D, broken for 2D (should be `{y_start, x_start}`).
		- FITSIO write uses `rank * yaxis` — only valid when all ranks have equal yaxis (not true under Gaussian-weighted decomposition).
		- `io_write` (src/io/io_functions.cpp) is NOT slab-aware: it hardcodes `MPI_File_write_at(File, 0, data, Ndata, MPI_BYTE)` — every rank writes at byte offset 0, no `MPI_File_set_view`, no global offset/subarray params. Only correct for whole-buffer single-region writes. When N ranks call it on one filename each writes its local slab at offset 0 (race) and the file ends up one-slab-sized. The deconvolution output block in test_clib.cpp (component/residual/restored _real/_imag) used `io_write` and therefore produced only one rank's 2048×2048 slab instead of the full 8192×8192 image. CORRECT pattern for distributed image writes lives in phase_correction_library.cpp (~lines 533-549) and fft_library.cpp/gridding_data.cpp: `MPI_Type_create_subarray(2, {grid_size_y,grid_size_x}, {yaxis,xaxis}, {y_start,x_start}, MPI_ORDER_C, MPI_DOUBLE)` + `MPI_File_set_view(fh,0,...)` + `MPI_File_write_all(fh, buf, xaxisyaxis, MPI_DOUBLE)`. CleanSolver itself is dimensionally correct (sized xaxisyaxis, converts local↔global peaks via x_start_/y_start_) — only the write was broken.
		- Bucket-sort `v_norm = vvt[i] / grid_size_y` while `vvt` is already in `[0,1]` (normalized by `create_binMS.py`) — dimensional mismatch with `min_vals[r]` that's also in `[0,1]`. Pre-existing in `test_clib.cpp`.
		- Kernel index `kKer = increaseprecision * fabs(v_dist + KernelLen)` relies on kernel symmetry; correct for Gauss/KaiserBessel but would be wrong for asymmetric kernels.
		- Layout-mismatch class (e.g., the 2026 `weight_grid` work): a buffer is allocated as a flat real array `[iwxaxisyaxis + i]` at the call site, but written with the interleaved complex stride `2(j+kxaxis+grid_wxaxisyaxis)` inside `wstack.cpp`. Anything sized like the visibility grid must use either the interleaved index everywhere or the flat index everywhere — never mix.
		- Histogram-domain mismatch class (relaxed decomposition): if a histogram is built over a data-driven `[u_min,u_max]` window and then slab-to-pixel mapped via `floor(slab/Ngrid * grid_size)`, the mapping silently treats the data range as if it spans the entire pixel grid. The histogram domain MUST match the mapping domain ([0,1] in RICK's normalised UV space) or you must apply an affine transform `pix = floor((hmin + slab/Ngrid * hrange) * grid_size)`.
		- Unsigned-int passing across function boundaries: large per-rank measurement counts (`Nmeasures`) routinely exceed 2^31 on fat nodes. Any decomposition or gridding helper that takes its trip count as `int` will silently truncate. Use `unsigned long long` (or `myull`) for all visibility counts crossing function boundaries.
		- `enforce_min1` / boundary-tightening lambdas: when iterating boundary arrays after rounded mapping, naive two-pass forward/backward walks can leave the first slab with size 0 if the backward walk pushes b[1] above 0 and then the front is restored to 0. Always use a two-sided clamp `[lo_i, hi_i]` where `lo_i = b[i-1]+1` and `hi_i = axis_total - (N-i)` so each slab keeps ≥1 pixel after the pass.

		WSClean-style sum-of-weights normalization (added 2026):
		- New `weight_grid` plumbed through `gridding → wstack → fftw_data → phase_correction(weight_grid_fft)` to divide image by per-pixel Σweights (summed over w-planes).

agents/claude_agents/agent-memory/radioastronomy-expert/MEMORY.md

+0 −1

Original line number	Diff line number	Diff line
		@@ -2,4 +2,3 @@
		- [RICK imaging pipeline](project_rick_pipeline.md) — RICK FFT structure, phase-leakage bug, fftshift checkerboard fix
		- [Do not compile after edits in RICK](feedback_no_autocompile.md) — never run cmake/make in RICK; user compiles himself
		- [RICK Doxygen & conventions](project_rick_doxygen.md) — src/ module map, WSClean ground-truth conventions, fabricated-doc failure mode, known unfixed bugs
		- [RICK deconvolution layer](project_rick_deconvolution.md) — src/deconvolution Hogbom CLEAN, PSF binary format/centre conventions, mock-PSF script, unimplemented PSF-load TODO

config.yaml

+11 −0

Original line number	Diff line number	Diff line
		@@ -117,3 +117,14 @@ deconvolution:
		# M8: false — positive_only=true stops CLEAN at the first negative peak,
		# truncating legitimate negative-sidelobe cleaning (and breaking Q/U/V).
		positive_only: false
		# Restoring beam for WSClean-style restoration:
		# restored = (clean components ⊛ elliptical Gaussian) + residual
		# 'auto' (or omitting the block) fits a 2D elliptical Gaussian to the PSF
		# main lobe (log-parabola least squares, pixels above 10% of the peak).
		# To override, give bmaj/bmin as FWHM in ARCSEC and bpa in DEGREES (east
		# of north); the override only takes effect when BOTH bmaj and bmin are
		# numeric.
		restoring_beam:
		bmaj: auto # arcsec; 'auto' = fit from PSF main lobe
		bmin: auto # arcsec; 'auto' = fit from PSF main lobe
		bpa: auto # degrees east of north; 'auto' = fit from PSF

config_relaxed.yaml

+7 −0

Original line number	Diff line number	Diff line
		@@ -98,3 +98,10 @@ deconvolution:
		# psf_path: DEPRECATED and ignored — the PSF (dirty beam) is generated
		# internally from the visibility data with identical weighting.
		positive_only: true
		# Restoring beam (WSClean-style restoration). 'auto' = fit an elliptical
		# Gaussian to the PSF main lobe; numeric values are FWHM in arcsec (bmaj,
		# bmin) and degrees east of north (bpa).
		restoring_beam:
		bmaj: auto
		bmin: auto
		bpa: auto