Commit bde27029 authored by Giovanni Lacopo's avatar Giovanni Lacopo
Browse files

Parallel deconvolution problem fixed

parent d0744023
Loading
Loading
Loading
Loading
+80 −20
Original line number Diff line number Diff line
@@ -87,18 +87,54 @@ real-part peak equals 1.0 (Hogbom convention) and replicated on every MPI rank
for the distributed CLEAN subtraction. It is also written to
`psf_real.bin` / `psf_imag.bin` as a diagnostic product.

### Restored image (NOT YET IMPLEMENTED — future work)
### Image Restoration & Restoring Beam

The photometrically correct restored image is
RICK implements WSClean-style image restoration: the clean-component map is
convolved with an elliptical Gaussian restoring beam fitted to the PSF main
lobe, then the residual is added.

$$I_{\text{restored}}(l,m) = \sum_k \gamma r^{(k)}_{\max} \delta(l - l_k, m - m_k) \ast G_{\text{clean}} + r^{(N_{\text{iter}})}(l,m)$$
**2D elliptical Gaussian restoring beam:**

where $G_{\text{clean}}$ is the restoring Gaussian beam (fitted to the PSF
main lobe). **This convolution step is not yet implemented in RICK**: the
current `restored_*.bin` output is `component + residual` *without* the
$G_{\text{clean}}$ convolution and is a diagnostic product only. The
restoring-beam fit and convolution **must be added before any photometric
measurement** is made on the restored image (see Section 5.1).
$$G(x, y) = \exp\!\left(-\frac{1}{2}\left[\frac{(x\cos\phi + y\sin\phi)^2}{\sigma_{\rm maj}^2} + \frac{(-x\sin\phi + y\cos\phi)^2}{\sigma_{\rm min}^2}\right]\right)$$

where $\sigma = \theta_{\rm FWHM}/(2\sqrt{2\ln 2})$ and $\phi$ is the beam
position angle (east of north).

**Restored image (WSClean convention):**

$$I_{\rm restored}(l,m) = \underbrace{\sum_k \gamma \, r^{(k)}_{\rm max}\,\delta(l-l_k,\,m-m_k)}_{\rm clean\ components} \circledast\, G_{\rm clean} \;+\; r^{(N_{\rm iter})}(l,m)$$

**Beam fitting (log-parabola method).** For an elliptical Gaussian,
$\ln{\rm PSF}$ is an exact quadratic form, so the main lobe (pixels above
10% of the unit peak) is fitted by *linear* least squares to

$$\ln {\rm PSF}(\Delta x, \Delta y) \approx c_0 + c_1\Delta x + c_2\Delta y + c_3\Delta x^2 + c_4\Delta x\Delta y + c_5\Delta y^2$$

The quadratic-form matrix $A = \bigl[\begin{smallmatrix}-2c_3 & -c_4\\ -c_4 & -2c_5\end{smallmatrix}\bigr]$
is eigen-decomposed: the eigenvalues are the inverse squared $1\sigma$ widths
along the principal axes ($\theta_{\rm FWHM} = 2\sqrt{2\ln 2}/\sqrt{\lambda}$,
major axis from the *smallest* eigenvalue) and the eigenvectors give the
position angle. This is the same approach WSClean uses internally; it needs
no non-linear iteration and no external linear-algebra library (6×6 normal
equations, Gaussian elimination). If the fit is degenerate (non-positive-
definite form, absurd axis lengths, < 12 main-lobe pixels) a conservative
circular fallback of FWHM $2\sqrt{2\ln 2} \approx 2.35$ pixels is used.
User override: set `deconvolution.restoring_beam.{bmaj,bmin,bpa}` to numeric
values (arcsec / degrees) instead of `auto`.

**Unit alignment.** The kernel is normalised to **unit peak**
($G(0,0) = 1$), the WSClean/CASA restoration convention:

$$\underbrace{[{\rm Jy/pixel}]}_{\rm components} \circledast \underbrace{[{\rm dimensionless,\ unit\ peak}]}_{\rm Gaussian\ kernel} + \underbrace{[{\rm Jy/beam}]}_{\rm residual} = \underbrace{[{\rm Jy/beam}]}_{\rm restored}$$

A delta-function component of flux $A$ Jy convolved with the unit-peak kernel
produces a Gaussian of **peak** $A$ Jy/beam at the component location — the
same flux scale as the residual (whose dirty beam is also peak-normalised to
1.0), so the two terms add consistently. Integrated source flux is recovered
as $\sum I_{\rm restored} / \Omega_{\rm beam}$ with the beam area
$\Omega_{\rm beam} = \pi\,\theta_{\rm maj}\theta_{\rm min}/(4\ln 2)$ pixels.
(A unit-*sum* kernel would instead restore a component to a peak of
$A/(2\pi\sigma_{\rm maj}\sigma_{\rm min})$, breaking the Jy/beam scale.)

### Configuration parameter ↔ symbol mapping

@@ -110,6 +146,9 @@ measurement** is made on the restored image (see Section 5.1).
| `deconvolution.gain` | $\gamma$ | Hogbom loop gain |
| `deconvolution.threshold` | $r_{\text{thresh}}$ | CLEAN convergence threshold |
| `deconvolution.max_iterations` | $N_{\text{iter}}$ | Maximum CLEAN minor cycles |
| `deconvolution.restoring_beam.bmaj` | $\theta_{\rm maj}$ | Restoring beam major axis FWHM (arcsec); `auto` = fit from PSF |
| `deconvolution.restoring_beam.bmin` | $\theta_{\rm min}$ | Restoring beam minor axis FWHM (arcsec); `auto` = fit from PSF |
| `deconvolution.restoring_beam.bpa`  | $\phi_{\rm PA}$    | Restoring beam position angle (deg E of N); `auto` = fit from PSF |
| `gridding.robust` | $R$ | Briggs robust weighting parameter |
| `gridding.kernel_oversampling` | $\beta$ | Kaiser-Bessel oversampling factor |

@@ -512,6 +551,9 @@ These keys live under `files:` and are read by `config_parser.cpp` exactly as li
| `deconvolution.threshold`          | float  | `0.001`  | `0.0005`| Stop when peak residual drops below this value [Jy/beam]. |
| `deconvolution.gain`               | float  | `0.1`    | `0.1`   | Loop gain: fraction of peak flux removed per iteration (0.05–0.2). |
| `deconvolution.positive_only`      | bool   | `false`  | `false` | Restrict cleaning to positive components (opt-in; keep false for negative sidelobes and Stokes Q/U/V). |
| `deconvolution.restoring_beam.bmaj`| float/`auto` | `auto` | `8.5` | Restoring beam major-axis FWHM [arcsec]; `auto` (or absent) = fit an elliptical Gaussian to the PSF main lobe. |
| `deconvolution.restoring_beam.bmin`| float/`auto` | `auto` | `7.0` | Restoring beam minor-axis FWHM [arcsec]; `auto` = fit from PSF. Override requires BOTH bmaj and bmin numeric. |
| `deconvolution.restoring_beam.bpa` | float/`auto` | `auto` | `25.0`| Restoring beam position angle [deg, east of north]; only consulted with a numeric bmaj/bmin override. |

> **PSF is generated automatically.** The dirty beam is computed internally
> from the visibility data (second gridding pass with unit amplitudes and the
@@ -625,21 +667,39 @@ The loop terminates when $A_k < \tau$ (threshold) or $k = N_{\text{max}}$ (maxim

#### Image Restoration

The true restored image is:
The restored image is:
$$
I_{\text{restored}} = (C * B_{\text{clean}}) + R_{\text{final}}
$$
where $B_{\text{clean}}$ is a Gaussian restoring beam fitted to the main lobe of the PSF.
where $B_{\text{clean}}$ is the elliptical Gaussian restoring beam fitted to
the main lobe of the PSF (or supplied via
`deconvolution.restoring_beam.{bmaj,bmin,bpa}`).

**RICK implements this in full** (see *Image Restoration & Restoring Beam*
in the Mathematical Formulation section): after the CLEAN loop terminates,
each rank fits the beam on the replicated PSF (deterministic — no
communication), builds a unit-peak Gaussian kernel truncated at $3\sigma$ of
the major axis, convolves the sparse component map with it (scatter-style,
$O(N_{\rm comp} K^2)$), and adds the residual. The fitted beam parameters
are printed at rank 0:

**RICK's current implementation** provides:
$$
I_{\text{restored}} = C + R_{\text{final}}
$$
This is a **diagnostic output only**. For photometrically valid results, users must:
1. Export `component_real.bin` and `residual_real.bin`
2. Fit a 2D Gaussian to the PSF main lobe
3. Convolve the component map with that Gaussian
4. Add the residual
```
[RICK INFO] Restoring beam: BMAJ=8.412" (3.21 px), BMIN=7.105" (2.71 px), BPA=23.4 deg (fitted from PSF main lobe)
```

In distributed runs the restoration performs a **component halo exchange**
before the convolution: each rank exchanges a border strip of exactly one
kernel radius $R$ of its component map with its face neighbours
(`MPI_Sendrecv`, `MPI_PROC_NULL` at the global image edge), so the Gaussian
wings of components within $R$ of a slab boundary are scattered into the
neighbouring slab instead of being clipped. The exchange is two-stage
(rows first, then *extended* column strips that include the received
row halos), which also propagates the corner blocks from diagonal
neighbours — the restored image is bitwise independent of the
decomposition layout up to floating-point summation order. To keep the
kernel and the halo message sizes identical on every rank, the kernel
radius safety clamp uses the global minimum slab dimension
(`MPI_Allreduce` MIN) rather than the local slab width.

### 5.2 MPI-Distributed Implementation

+0 −5
Original line number Diff line number Diff line
@@ -9,7 +9,6 @@ RICK is a distributed radio interferometry imaging code (C++/MPI+OpenMP) doing W
**Decomposition model:**
- 1D mode: full X (x_start=0, xaxis=grid_size_x), Y strip per rank (Gaussian-weighted strip sizes possible).
- 2D mode: sub-rectangle of both axes; requires ≥4 ranks; "2d" or "2d_gaussian" modes via config.
- "relaxed" mode (added 2026): data-driven point-based UV decomposition. Builds per-axis histograms over the *normalised* [0,1] coordinate domain (NOT the data-extent envelope — this would silently rescale the data range to span the full pixel grid and put slab boundaries in the wrong pixels), then iteratively refines slab boundaries via pair-optimal V-shape binary search. Histogram-cell slab boundaries are mapped onto pixel boundaries via `pixel = floor(slab/Ngrid * grid_size)`. Output `(start_x, start_y, size_x, size_y, rank_2d_x, rank_2d_y, npx, npy)` matches the 2D contract so the rest of the pipeline (bucket-sort, gridding, wstack, FFT, phase, MPI-IO subarray) is unchanged. Requires ≥4 ranks. Conceptually deferred: relaxed_decomposition runs AFTER reading uu/vv local coordinates because it's data-dependent; the 1D/2D paths run up-front.
- Visibilities distributed via ghost-region bucket sort: any vis whose kernel footprint (±KernelLen cells) overlaps a rank's domain is sent to that rank. Each rank then clips kernel to its local domain in `wstack.cpp`.

**Why:** Allows each rank to grid disjoint pieces of overlapping kernel footprints — no double-counting and no halo exchange needed. The grid is *output*-decomposed, not input-decomposed.
@@ -22,13 +21,9 @@ RICK is a distributed radio interferometry imaging code (C++/MPI+OpenMP) doing W
- Halo-exchange utilities (`exchange_halos_x/y/2d` in `src/communication/communication.cpp`) have stride bugs (write at `grid + y*xaxis + size_x` which assumes a halo-extended row that isn't allocated). Treat these as broken — use the ghost-region clipping design instead.
- MPI-IO write in `phase_correction` uses `starts[2] = {y_start, 0}` — correct for 1D, broken for 2D (should be `{y_start, x_start}`).
- FITSIO write uses `rank * yaxis` — only valid when all ranks have equal yaxis (not true under Gaussian-weighted decomposition).
- **`io_write` (src/io/io_functions.cpp) is NOT slab-aware:** it hardcodes `MPI_File_write_at(File, 0, data, Ndata, MPI_BYTE)` — every rank writes at byte offset 0, no `MPI_File_set_view`, no global offset/subarray params. Only correct for whole-buffer single-region writes. When N ranks call it on one filename each writes its local slab at offset 0 (race) and the file ends up one-slab-sized. The deconvolution output block in test_clib.cpp (component/residual/restored _real/_imag) used `io_write` and therefore produced only one rank's 2048×2048 slab instead of the full 8192×8192 image. CORRECT pattern for distributed image writes lives in phase_correction_library.cpp (~lines 533-549) and fft_library.cpp/gridding_data.cpp: `MPI_Type_create_subarray(2, {grid_size_y,grid_size_x}, {yaxis,xaxis}, {y_start,x_start}, MPI_ORDER_C, MPI_DOUBLE)` + `MPI_File_set_view(fh,0,...)` + `MPI_File_write_all(fh, buf, xaxis*yaxis, MPI_DOUBLE)`. CleanSolver itself is dimensionally correct (sized xaxis*yaxis, converts local↔global peaks via x_start_/y_start_) — only the write was broken.
- Bucket-sort `v_norm = vvt[i] / grid_size_y` while `vvt` is already in `[0,1]` (normalized by `create_binMS.py`) — dimensional mismatch with `min_vals[r]` that's also in `[0,1]`. Pre-existing in `test_clib.cpp`.
- Kernel index `kKer = increaseprecision * fabs(v_dist + KernelLen)` relies on kernel symmetry; correct for Gauss/KaiserBessel but would be wrong for asymmetric kernels.
- **Layout-mismatch class** (e.g., the 2026 `weight_grid` work): a buffer is allocated as a flat real array `[iw*xaxis*yaxis + i]` at the call site, but written with the interleaved complex stride `2*(j+k*xaxis+grid_w*xaxis*yaxis)` inside `wstack.cpp`. Anything sized like the visibility grid must use either the interleaved index everywhere or the flat index everywhere — never mix.
- **Histogram-domain mismatch class** (relaxed decomposition): if a histogram is built over a data-driven `[u_min,u_max]` window and then slab-to-pixel mapped via `floor(slab/Ngrid * grid_size)`, the mapping silently treats the data range as if it spans the entire pixel grid. The histogram domain MUST match the mapping domain ([0,1] in RICK's normalised UV space) or you must apply an affine transform `pix = floor((hmin + slab/Ngrid * hrange) * grid_size)`.
- **Unsigned-int passing across function boundaries:** large per-rank measurement counts (`Nmeasures`) routinely exceed 2^31 on fat nodes. Any decomposition or gridding helper that takes its trip count as `int` will silently truncate. Use `unsigned long long` (or `myull`) for all visibility counts crossing function boundaries.
- **`enforce_min1` / boundary-tightening lambdas:** when iterating boundary arrays after rounded mapping, naive two-pass forward/backward walks can leave the first slab with size 0 if the backward walk pushes b[1] above 0 and then the front is restored to 0. Always use a two-sided clamp `[lo_i, hi_i]` where `lo_i = b[i-1]+1` and `hi_i = axis_total - (N-i)` so each slab keeps ≥1 pixel after the pass.

**WSClean-style sum-of-weights normalization (added 2026):**
- New `weight_grid` plumbed through `gridding → wstack → fftw_data → phase_correction(weight_grid_fft)` to divide image by per-pixel Σweights (summed over w-planes).
+0 −1
Original line number Diff line number Diff line
@@ -2,4 +2,3 @@
- [RICK imaging pipeline](project_rick_pipeline.md) — RICK FFT structure, phase-leakage bug, fftshift checkerboard fix
- [Do not compile after edits in RICK](feedback_no_autocompile.md) — never run cmake/make in RICK; user compiles himself
- [RICK Doxygen & conventions](project_rick_doxygen.md) — src/ module map, WSClean ground-truth conventions, fabricated-doc failure mode, known unfixed bugs
- [RICK deconvolution layer](project_rick_deconvolution.md) — src/deconvolution Hogbom CLEAN, PSF binary format/centre conventions, mock-PSF script, unimplemented PSF-load TODO
+11 −0
Original line number Diff line number Diff line
@@ -117,3 +117,14 @@ deconvolution:
  # M8: false — positive_only=true stops CLEAN at the first negative peak,
  # truncating legitimate negative-sidelobe cleaning (and breaking Q/U/V).
  positive_only: false
  # Restoring beam for WSClean-style restoration:
  #   restored = (clean components ⊛ elliptical Gaussian) + residual
  # 'auto' (or omitting the block) fits a 2D elliptical Gaussian to the PSF
  # main lobe (log-parabola least squares, pixels above 10% of the peak).
  # To override, give bmaj/bmin as FWHM in ARCSEC and bpa in DEGREES (east
  # of north); the override only takes effect when BOTH bmaj and bmin are
  # numeric.
  restoring_beam:
    bmaj: auto   # arcsec; 'auto' = fit from PSF main lobe
    bmin: auto   # arcsec; 'auto' = fit from PSF main lobe
    bpa:  auto   # degrees east of north; 'auto' = fit from PSF
+7 −0
Original line number Diff line number Diff line
@@ -98,3 +98,10 @@ deconvolution:
  # psf_path: DEPRECATED and ignored — the PSF (dirty beam) is generated
  # internally from the visibility data with identical weighting.
  positive_only: true
  # Restoring beam (WSClean-style restoration). 'auto' = fit an elliptical
  # Gaussian to the PSF main lobe; numeric values are FWHM in arcsec (bmaj,
  # bmin) and degrees east of north (bpa).
  restoring_beam:
    bmaj: auto
    bmin: auto
    bpa:  auto
Loading