Add NISAR azimuth-block splitting for parallel full-frame processing#717
Draft
mgovorcin wants to merge 2 commits into
Draft
Add NISAR azimuth-block splitting for parallel full-frame processing#717mgovorcin wants to merge 2 commits into
mgovorcin wants to merge 2 commits into
Conversation
Splits a single non-OPERA-burst frame (e.g. NISAR GSLC) into N azimuth
blocks processed in parallel as synthetic "bursts". Each block reuses
the full input file list but applies its own output_options.bounds, so
wrapped_phase.run masks per block via _get_mask. The existing burst pool
runs the blocks concurrently and stitching_bursts.run mosaics them.
Changes
-------
- New _block_split.split_frame_into_blocks: computes per-block (bounds,
epsg) from cfg georef with a minimum-halo formula
max(half_window_y,
similarity_search_radius * stride_y,
(corr_window_y // 2) * stride_y) + 5
so block edges aren't cropped by SHP/similarity/stitcher windows.
- displacement.py: new module-level _GroupedInputs dataclass and
_prepare_grouped_inputs helper that detects OPERA vs non-OPERA-burst
mode from group_by_burst behavior. In non-OPERA mode, optional file
lists (amp_disp, amp_mean, layover_shadow) are broadcast to every
block id so second-batch reuse works.
- _create_burst_cfg gains a bounds= kwarg that writes
output_options.bounds/_epsg per block. OPERA path passes None and is
unchanged.
- config.InputOptions adds azimuth_blocks: int = 1 (default = no
splitting, OPERA path unchanged) and halo_rows: int | None.
When running azimuth-block mode with n_parallel_bursts > 1, multiple worker processes read the same NISAR GSLC (HDF5) files concurrently at different spatial extents. On NFS/Lustre filesystems this triggers HDF5 file-locking hangs. Set HDF5_USE_FILE_LOCKING=FALSE (via setdefault, so an explicit caller setting is honoured) before spawning workers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scottstanie
pushed a commit
to scottstanie/dolphin
that referenced
this pull request
May 24, 2026
Replaces the cube → per-layer tif export with a single per-layer VRT shim that points at ``ZARR:cube.zarr:/<var>:i`` as its source. GDAL's ZARR driver (3.10+) reads the slice natively, so the existing tif-based downstream pipeline (interferogram formation, stitching, unwrap, timeseries) sees ordinary 2D rasters with no pixel-data duplication. Highlights: - Default `zarr_format=2` for the cube (GDAL's ZARR read path supports it cleanly; v3's `bytes` codec isn't implemented on the GDAL read side yet). `zarr_format=3` is still selectable via writer kwarg. - New `emit_layer_vrts`, `zarr_subdataset_uri`, `layer_vrt_path` helpers in `dolphin.io._geozarr`. VRTs use absolute paths so they survive the per-ministack → top-level rename in `sequential.py`. - `_ZarrLayerStack.export_tifs` now writes VRT shims instead of dumping layer data into tif files. `_make_stack_output` returns `.slc.vrt` paths in GEOZARR mode. - Skip the GeoTIFF-only `repack_rasters` passes in GEOZARR mode (the cube is already chunked and compressed; repack would create one tif per VRT, undoing the no-duplication win). - `_ensure_data_array` rejects the reserved coord names (x/y/spatial_ref) and checks shape against any existing array of the same name to catch accidental writes that would silently corrupt a coord. - `wrapped_phase.py` cached/skip path and `sequential._get_outputs_from_folder` also glob for `.vrt` so GEOZARR runs can resume. Parallelism: matches PR isce-framework#717. Each "burst" (or synthetic-burst azimuth-split block) process gets its own work directory and its own cube — no cross-process zarr coordination. Inside a process the cube writer drains on a single background thread, the direct analog of `BackgroundStackWriter`'s single-thread tif feeder, so the parallel ThreadPoolExecutor in `single.py` sees the same write-side throughput either way. Tests updated: - `test_io_geozarr.py`: added VRT-readable-via-rasterio, VRT-survives-move, shape-mismatch rejection, reserved-coord-name rejection, and default-is-v2 cases. - `test_workflows_single.py::test_sequential_geozarr` and `test_workflows_displacement.py::test_displacement_run_geozarr` now assert *no* per-date `.slc.tif` are written, that VRTs are emitted, and that VRT reads round-trip to the cube layer. https://claude.ai/code/session_01UC8HNQWpNsKBmBNhdMhsLu
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a block-as-burst pattern so NISAR full-frame GSLCs can be processed in parallel using the existing burst pool.
For burst inputs nothing changes. For NISAR (non-burst) inputs, when
input_options.azimuth_blocks > 1the frame is split into N azimuth blocks at the dispatch layer, each block is processed as a synthetic burst throughProcessPoolExecutor, and per-block outputs are stitched by the existingstitching_bursts.runstep. No changes towrapped_phase.runor the stitcher are required.Change Summary
workflows/_block_split.py(new) —split_frame_into_blocks()divides the pixel grid into N azimuth blocks with a halo;_min_halo_rows()auto-computes the minimum safe overlap from half-window, similarity search radius, and stride settings.workflows/config/_common.py—InputOptionsgainsazimuth_blocks: int = 1andhalo_rows: Optional[int] = None.workflows/displacement.py— extracts dispatch into a testable_prepare_grouped_inputs()function; setsHDF5_USE_FILE_LOCKING=FALSEbefore spawning workers when multiple processes share the same HDF5 files (prevents hangs on NFS/Lustre).workflows/_utils.py—_create_burst_cfgaccepts an optionalboundsparameter to setoutput_options.boundsper block.Typical config for a 4-block run:
CLAUDE NOTES
Key insight: n_parallel_bursts is dual-purpose. It's the outer pool size AND a budget that gets re-allocated to inner I/O workers when there are fewer "bursts" than pool slots.
n_parallel_bursts = 8, with N bursts available:
N = 1 (FULL) → 1 active process × 8 inner I/O threads
N = 4 → 4 active processes × 2 inner I/O threads each
N = 8 (BLOCKED) → 8 active processes × 1 inner I/O thread each
FULL vs BLOCKED
azimuth_blocks=1)azimuth_blocks=N)vmap(within 1 block)The speedup mechanism is N copies of single-threaded work running in parallel.
Each worker still uses ~1 core; you just have N of them running simultaneously on different rows.
Per-Knob Impact
n_parallel_burstsazimuth_blocksfor maximum block parallelismthreads_per_workerSets:
OMP_NUM_THREADSMKL_NUM_THREADSControls BLAS threads inside JAX kernels:
eighsolvematmulSeparate from JAX/XLA’s own thread pool.
Typical values:
JAX / XLA
JAX’s XLA backend maintains its own thread pool, sized by default to:
Local benchmark full frame vs blocked results
displacement.run end-to-end through PS+PL+stitch benchmark run with taskset -c 0-33
GSLCs : 7 (Track 042 Frame 070 Descending orbit)
half_window : 5
num_blocks : 8
n_parallel_bursts : 8 (both modes)
threads_per_worker : 2 (both modes)
block_shape: 2048x2048 (both modes)
active threads FULL : 2
active threads BLK : 16
FULL wall = 10667.3s (177.79 min)
BLOCKED wall = 2411.2s ( 40.19 min)
SPEEDUP (BLOCKED vs FULL) : 4.42×
Checklist