Add NISAR azimuth-block splitting for parallel full-frame processing by mgovorcin · Pull Request #717 · isce-framework/dolphin

mgovorcin · 2026-05-22T05:59:50Z

Adds a block-as-burst pattern so NISAR full-frame GSLCs can be processed in parallel using the existing burst pool.

For burst inputs nothing changes. For NISAR (non-burst) inputs, when input_options.azimuth_blocks > 1 the frame is split into N azimuth blocks at the dispatch layer, each block is processed as a synthetic burst through ProcessPoolExecutor, and per-block outputs are stitched by the existing stitching_bursts.run step. No changes to wrapped_phase.run or the stitcher are required.

Change Summary

workflows/_block_split.py (new) — split_frame_into_blocks() divides the pixel grid into N azimuth blocks with a halo; _min_halo_rows() auto-computes the minimum safe overlap from half-window, similarity search radius, and stride settings.
workflows/config/_common.py — InputOptions gains azimuth_blocks: int = 1 and halo_rows: Optional[int] = None.
workflows/displacement.py — extracts dispatch into a testable _prepare_grouped_inputs() function; sets HDF5_USE_FILE_LOCKING=FALSE before spawning workers when multiple processes share the same HDF5 files (prevents hangs on NFS/Lustre).
workflows/_utils.py — _create_burst_cfg accepts an optional bounds parameter to set output_options.bounds per block.

Typical config for a 4-block run:

input_options:
  azimuth_blocks: 4
worker_settings:
  n_parallel_bursts: 4
  threads_per_worker: 2

CLAUDE NOTES

└── Process #k (one per active "burst" / NISAR block)
    │
    ├── ThreadPoolExecutor (size = workers_per_burst = n_parallel_bursts // active_bursts)
    │   └── threads doing GDAL/HDF5 read-ahead inside EagerLoader
    │
    └── Main thread: per-output-block phase-linking
        ├── JAX/XLA host pool (default = os.cpu_count(), can cap via XLA_FLAGS)
        │   └── parallelizes vmap'd ops across pixels
        └── BLAS via OMP_NUM_THREADS = threads_per_worker
            └── multi-threads eigh, solve, matmul inside JAX kernels

Key insight: n_parallel_bursts is dual-purpose. It's the outer pool size AND a budget that gets re-allocated to inner I/O workers when there are fewer "bursts" than pool slots.

n_parallel_bursts = 8, with N bursts available:
N = 1 (FULL) → 1 active process × 8 inner I/O threads
N = 4 → 4 active processes × 2 inner I/O threads each
N = 8 (BLOCKED) → 8 active processes × 1 inner I/O thread each

FULL vs BLOCKED

Aspect	FULL (`azimuth_blocks=1`)	BLOCKED (`azimuth_blocks=N`)
Processes	1	N (8 typical)
GIL	Locked to 1 interpreter	Escaped — N interpreters
JAX kernels	Parallelize across pixels via `vmap` (within 1 block)	Same per block, run N in parallel across blocks
Python control flow	Strictly serial sweep through frame	N serial sweeps in parallel
HDF5 reads	1 stream	N concurrent streams
JIT compile	Once, ~10–30 s	N times in parallel, ~30 s wall
Working set	Full frame in 1 process	~1/N of frame in each process

The speedup mechanism is N copies of single-threaded work running in parallel.

Each worker still uses ~1 core; you just have N of them running simultaneously on different rows.

Per-Knob Impact

`n_parallel_bursts`

Bigger = more outer pool slots (capped by available bursts)
For NISAR: this is the number of blocks running in parallel
Set equal to azimuth_blocks for maximum block parallelism
Typical sweet spot: both at 8

`threads_per_worker`

Sets:

OMP_NUM_THREADS
MKL_NUM_THREADS

Controls BLAS threads inside JAX kernels:

eigh
solve
matmul

Separate from JAX/XLA’s own thread pool.

Typical values:

2 → reasonable default
1 → if CPU constrained
4+ → can help in FULL mode where one process can use more parallelism

JAX / XLA

JAX’s XLA backend maintains its own thread pool, sized by default to:

os.cpu_count()

Local benchmark full frame vs blocked results

displacement.run end-to-end through PS+PL+stitch benchmark run with taskset -c 0-33

GSLCs : 7 (Track 042 Frame 070 Descending orbit)
half_window : 5
num_blocks : 8
n_parallel_bursts : 8 (both modes)
threads_per_worker : 2 (both modes)
block_shape: 2048x2048 (both modes)
active threads FULL : 2
active threads BLK : 16

FULL wall = 10667.3s (177.79 min)
BLOCKED wall = 2411.2s ( 40.19 min)
SPEEDUP (BLOCKED vs FULL) : 4.42×

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review

Splits a single non-OPERA-burst frame (e.g. NISAR GSLC) into N azimuth blocks processed in parallel as synthetic "bursts". Each block reuses the full input file list but applies its own output_options.bounds, so wrapped_phase.run masks per block via _get_mask. The existing burst pool runs the blocks concurrently and stitching_bursts.run mosaics them. Changes ------- - New _block_split.split_frame_into_blocks: computes per-block (bounds, epsg) from cfg georef with a minimum-halo formula max(half_window_y, similarity_search_radius * stride_y, (corr_window_y // 2) * stride_y) + 5 so block edges aren't cropped by SHP/similarity/stitcher windows. - displacement.py: new module-level _GroupedInputs dataclass and _prepare_grouped_inputs helper that detects OPERA vs non-OPERA-burst mode from group_by_burst behavior. In non-OPERA mode, optional file lists (amp_disp, amp_mean, layover_shadow) are broadcast to every block id so second-batch reuse works. - _create_burst_cfg gains a bounds= kwarg that writes output_options.bounds/_epsg per block. OPERA path passes None and is unchanged. - config.InputOptions adds azimuth_blocks: int = 1 (default = no splitting, OPERA path unchanged) and halo_rows: int | None.

When running azimuth-block mode with n_parallel_bursts > 1, multiple worker processes read the same NISAR GSLC (HDF5) files concurrently at different spatial extents. On NFS/Lustre filesystems this triggers HDF5 file-locking hangs. Set HDF5_USE_FILE_LOCKING=FALSE (via setdefault, so an explicit caller setting is honoured) before spawning workers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replaces the cube → per-layer tif export with a single per-layer VRT shim that points at ``ZARR:cube.zarr:/<var>:i`` as its source. GDAL's ZARR driver (3.10+) reads the slice natively, so the existing tif-based downstream pipeline (interferogram formation, stitching, unwrap, timeseries) sees ordinary 2D rasters with no pixel-data duplication. Highlights: - Default `zarr_format=2` for the cube (GDAL's ZARR read path supports it cleanly; v3's `bytes` codec isn't implemented on the GDAL read side yet). `zarr_format=3` is still selectable via writer kwarg. - New `emit_layer_vrts`, `zarr_subdataset_uri`, `layer_vrt_path` helpers in `dolphin.io._geozarr`. VRTs use absolute paths so they survive the per-ministack → top-level rename in `sequential.py`. - `_ZarrLayerStack.export_tifs` now writes VRT shims instead of dumping layer data into tif files. `_make_stack_output` returns `.slc.vrt` paths in GEOZARR mode. - Skip the GeoTIFF-only `repack_rasters` passes in GEOZARR mode (the cube is already chunked and compressed; repack would create one tif per VRT, undoing the no-duplication win). - `_ensure_data_array` rejects the reserved coord names (x/y/spatial_ref) and checks shape against any existing array of the same name to catch accidental writes that would silently corrupt a coord. - `wrapped_phase.py` cached/skip path and `sequential._get_outputs_from_folder` also glob for `.vrt` so GEOZARR runs can resume. Parallelism: matches PR isce-framework#717. Each "burst" (or synthetic-burst azimuth-split block) process gets its own work directory and its own cube — no cross-process zarr coordination. Inside a process the cube writer drains on a single background thread, the direct analog of `BackgroundStackWriter`'s single-thread tif feeder, so the parallel ThreadPoolExecutor in `single.py` sees the same write-side throughput either way. Tests updated: - `test_io_geozarr.py`: added VRT-readable-via-rasterio, VRT-survives-move, shape-mismatch rejection, reserved-coord-name rejection, and default-is-v2 cases. - `test_workflows_single.py::test_sequential_geozarr` and `test_workflows_displacement.py::test_displacement_run_geozarr` now assert *no* per-date `.slc.tif` are written, that VRTs are emitted, and that VRT reads round-trip to the cube layer. https://claude.ai/code/session_01UC8HNQWpNsKBmBNhdMhsLu

mgovorcin and others added 2 commits May 21, 2026 22:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NISAR azimuth-block splitting for parallel full-frame processing#717

Add NISAR azimuth-block splitting for parallel full-frame processing#717
mgovorcin wants to merge 2 commits into
isce-framework:mainfrom
mgovorcin:azimuth-block

mgovorcin commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mgovorcin commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Summary

CLAUDE NOTES

FULL vs BLOCKED

Per-Knob Impact

n_parallel_bursts

threads_per_worker

JAX / XLA

Local benchmark full frame vs blocked results

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mgovorcin commented May 22, 2026 •

edited

Loading

`n_parallel_bursts`

`threads_per_worker`