description:Recurring patterns, conventions, and known bug classes in the RICK (Radio Imaging Code Kernels) C++/MPI+OpenMP codebase
type:project
---
RICK is a distributed radio interferometry imaging code (C++/MPI+OpenMP) doing W-projection gridding.
**Decomposition model:**
- 1D mode: full X (x_start=0, xaxis=grid_size_x), Y strip per rank (Gaussian-weighted strip sizes possible).
- 2D mode: sub-rectangle of both axes; requires ≥4 ranks; "2d" or "2d_gaussian" modes via config.
- Visibilities distributed via ghost-region bucket sort: any vis whose kernel footprint (±KernelLen cells) overlaps a rank's domain is sent to that rank. Each rank then clips kernel to its local domain in `wstack.cpp`.
**Why:** Allows each rank to grid disjoint pieces of overlapping kernel footprints — no double-counting and no halo exchange needed. The grid is *output*-decomposed, not input-decomposed.
**How to apply:** Any code touching grid coordinates must convert local↔global. `j_global = j + x_start`, `k_global = k + y_start`. Don't forget x_start (only y_start was historically threaded through).
**Recurring bug classes:**
- Forgetting to clip kmax/jmax to local `[0, yaxis-1]`/`[0, xaxis-1]` after global→local conversion → writes into adjacent w-plane memory or OOB.
- Forgetting `x_start` in coordinate computations (only y_start was originally plumbed through).
- Halo-exchange utilities (`exchange_halos_x/y/2d` in `src/communication/communication.cpp`) have stride bugs (write at `grid + y*xaxis + size_x` which assumes a halo-extended row that isn't allocated). Treat these as broken — use the ghost-region clipping design instead.
- MPI-IO write in `phase_correction` uses `starts[2] = {y_start, 0}` — correct for 1D, broken for 2D (should be `{y_start, x_start}`).
- FITSIO write uses `rank * yaxis` — only valid when all ranks have equal yaxis (not true under Gaussian-weighted decomposition).
- Bucket-sort `v_norm = vvt[i] / grid_size_y` while `vvt` is already in `[0,1]` (normalized by `create_binMS.py`) — dimensional mismatch with `min_vals[r]` that's also in `[0,1]`. Pre-existing in `test_clib.cpp`.
- Kernel index `kKer = increaseprecision * fabs(v_dist + KernelLen)` relies on kernel symmetry; correct for Gauss/KaiserBessel but would be wrong for asymmetric kernels.
-**Layout-mismatch class** (e.g., the 2026 `weight_grid` work): a buffer is allocated as a flat real array `[iw*xaxis*yaxis + i]` at the call site, but written with the interleaved complex stride `2*(j+k*xaxis+grid_w*xaxis*yaxis)` inside `wstack.cpp`. Anything sized like the visibility grid must use either the interleaved index everywhere or the flat index everywhere — never mix.
- New `weight_grid` plumbed through `gridding → wstack → fftw_data → phase_correction(weight_grid_fft)` to divide image by per-pixel Σweights (summed over w-planes).
- Conceptually correct compared to WSClean's `I = FFT[G]/Σw`; FFT-ing the weight grid with the same backward HeFFTe transform and then summing in image space gives the per-pixel sum-of-gridded-weights, which matches what WSClean computes.
- Distributed correctness requires **per-pixel sum across ranks** for the same (iu,iv,iw), but RICK's grid is output-decomposed (each pixel lives on exactly one rank, weights for that pixel are accumulated locally) — so the post-bucket-sort weight_grid is already complete per pixel; no Allreduce needed for normalization. Halo cells from kernel overlap are still the correctness concern.
- Common mistakes when implementing the chain: (1) the `2*` factor in init/accumulation when the buffer is allocated flat, (2) initialization gated behind `WEIGHTING_UNIFORM/BRIGGS` so NATURAL gets all-zeros and the divide zeros the image, (3) `ww[i]==1.0` skip in the visibility pass but not in the weight pass causing weight/data asymmetry, (4) missing `dwnorm` factor on the weight_sum side leaves a constant residual factor of `num_w_planes`.
**Conventions:**
-`myuint = unsigned int` (32-bit), `myull = unsigned long long`. `iKer` is sometimes `myull` and sometimes `unsigned int` across files — silent truncation risk for very large grids.
- Visibility coords (`vv[i]`, `uu[i]`) read from disk are in `[0,1]`. Cell-unit coords obtained via `pos_v = vv[i] / dx` where `dx = 1/grid_size_x`.
-`ww[i] == 1.0` is treated as flagged/skipped in `wstack`, but is also the legitimate maximum w after normalization — suspect logic.
- OpenMP main pass uses `#pragma omp atomic` on grid accumulation. Weighting pre-pass is serial (no parallel-for).
- MPI thread level: `MPI_THREAD_FUNNELED` (only main thread does MPI).
**Weighting refactor (mid-2026):**
- The `WEIGHTING_UNIFORM` / `WEIGHTING_BRIGGS` compile-time switches were eliminated; weighting mode is now selected at runtime from `config.weighting_type` ("NATURAL" / "UNIFORM" / "BRIGGS") with `config.robust` carrying R.
- WSClean parity formulas (must match): Uniform `w_eff = w_vis / Σw_cell`; Briggs `w_eff = w_vis / (1 + Σw_cell · f²)` with `f² = (5·10^{-R})² / (Σ(Σw_cell)² / Σ(Σw_cell))` — note Σ is GLOBAL across the whole density grid (and across MPI ranks → one Allreduce).
- Density grid `weight_uv` is **flat** real-only (no `2*` interleave stride); the prior code allocated 2× and indexed with `2*(j+k*xaxis+grid_w*xaxis*yaxis)` which is a bug pattern from the same family as the `weight_grid` flat/interleaved mismatch.
- Per-vis effective weight must be computed ONCE per visibility at the vis-center cell, NOT inside the per-kernel-cell inner loop (the prior code did the latter and overwrote `out_weight_uniform[visindex]` per cell).
- Pre-pass must be parallel and must skip `ww==1.0` flagged points to stay symmetric with the main gridding pass.
- Use `double` (not `float`) for Σw and Σw² accumulations — float overflows quickly on real datasets and corrupts Briggs `f²`.