dimos map tooling, stream alignment#2306
Conversation
Replaces the monolithic pgo_then_voxels with four primitives over mem2 Streams and Transforms: pgo_keyframes(lidar) -> Stream[Keyframe] keyframes_to_corrections(kfs) -> Stream[Transform] (world_corrected <- world_raw) make_interpolator(corrections) -> (ts) -> Transform (SLERP + linear, endpoint-clipped) apply_corrections(stream, corr) -> Stream[T] (shuffles obs.pose) Drift correction is now a first-class, reusable Stream[Transform] that any pose-stamped consumer can apply — same math as before (rigid T_corr = T_global @ T_local^-1 per keyframe, SLERP + linear interpolation between bracketing keyframes), just composable. dimos map --pgo migrates to the new primitives; pgo_then_voxels is deleted. The internal _SimplePGO / PGOConfig / _KeyPose machinery stays in pgo.py and is imported by pgo2.
Renames: pgo.py -> pgo_internals.py (gtsam/ICP machinery) pgo2.py -> pgo.py (public Stream-shaped API) `dimos.mapping.relocalization.pgo` is now the canonical import path (`pgo_keyframes`, `keyframes_to_corrections`, `make_interpolator`, `apply_corrections`, `correction_at`, `Keyframe`). The internal _SimplePGO / PGOConfig / _KeyPose / _icp / _voxel_downsample helpers live in pgo_internals.py and are imported lazily inside pgo.py to keep gtsam off the public-API import path.
Adds test_pgo.py covering pose normalization on Observation, the Transform↔Pose3 conversion helpers, the interpolator edge cases, apply_corrections behavior, and a real-recording smoke test (skipped when data/go2_short.db is absent). Annotates the two map.py helpers so they pass mypy. Removes leftover section divider comments to satisfy the no-section-markers project rule.
The colors-copy block in `PointCloud2.transform` calls `self.pointcloud` to check `has_colors()`, which forces a tensor->legacy conversion on every invocation. With `pgo --full-pgo` rebuilding from hundreds of lidar frames, that hidden allocation dominates the per-frame cost. The colors path was lidar-irrelevant (lidar clouds have no colors anyway) and the feature it was meant to support never landed; remove it so transform() stays a clean numpy round-trip.
…s into feat/ivan/pgo_rewrite
These end-to-end CLI tests forked python -m dimos.robot.cli.dimos per case, re-paying the heavy per-verb imports (rerun, open3d, voxel) 7 times. Run them in-process via Typer's CliRunner so those imports are paid once: 33.7s -> 7.5s, all 7 still pass. Trade-off: shared interpreter, no per-test process isolation.
Add from_time/to_time (relative to the first observation) and from_timestamp/to_timestamp (absolute epoch seconds) for windowing a stream by time. A trailing to_time is a duration measured from the current start, so from_time(2).to_time(30) reads as "skip 2s, take the following 30s"; frames mix freely (from_timestamp(ts).to_time(30)). Shared base for the stream-alignment (#2306) and go2dds (#2314) branches, which both need this windowing API.
…arch
- pgo_keyframes: widen stream param to Iterable[Observation[PointCloud2]]
(clears the mypy gate; it never uses Stream methods)
- pose_fill_db: order_by("ts") on both align inputs — align requires
ts-ascending iteration, but sqlite defaults to id ASC (insertion order),
which would silently mis-pair out-of-order recordings
- add CI-running unit tests for Observation.with_pose (lazy payload kept)
and pose_fill(mount=...) composition
- remove the unreferenced autoresearch/ research drop (near-duplicate of
pgo_auto.py + scaffolding); pgo_auto.py kept
Add from_time/to_time (relative to the first observation) and from_timestamp/to_timestamp (absolute epoch seconds) for windowing a stream by time. A trailing to_time is a duration measured from the current start, so from_time(2).to_time(30) reads as "skip 2s, take the following 30s"; frames mix freely (from_timestamp(ts).to_time(30)). Shared base for the stream-alignment (#2306) and go2dds (#2314) branches, which both need this windowing API.
Pulls the stairs dataset LFS pointer from feat/ivan/go2stairs so the map tooling can use it; object already on remote LFS store.
…nment # Conflicts: # dimos/memory2/stream.py # dimos/memory2/test_stream.py
The self-hosted CI image's published :dev tag is stale (predates libturbojpeg0-dev in docker/python/Dockerfile), so TurboJPEG() raises at runtime and the four verbs that decode the color_image stream fail. Guard them with skipif so they skip on a lib-less image and still run where the native lib is present.
| if no_gui: | ||
| print(f"open with: rerun {out}") | ||
| else: | ||
| subprocess.Popen(["rerun", str(out)]) |
There was a problem hiding this comment.
You maintain to reference to this? How is it ever closed?
There was a problem hiding this comment.
wait I don't get it - reference to what?
| --out mid360_renamed.db \\ | ||
| --rename go2_lidar=lidar \\ | ||
| --rename lidar=fastlio_lidar \\ | ||
| --rename odometry=fastlio_odometry |
There was a problem hiding this comment.
This is meant to be used through dimos map rename, no? Then please don't document running the file itself.
|
|
||
| for p in primary_iter: | ||
| # Advance the cursor until `nxt` sits past `p`; `prev` trails just behind. | ||
| while nxt is not None and nxt.ts <= p.ts: |
There was a problem hiding this comment.
Why <=? If nxt.ts == p.ts isn't that the best possible solution?
| # Skip placeholder poses (origin position OR zero quaternion). | ||
| if pose[0] == 0 and pose[1] == 0 and pose[2] == 0: | ||
| continue | ||
| if pose[3] == 0 and pose[4] == 0 and pose[5] == 0 and (pose[6] == 0 or pose[6] == 1): | ||
| continue |
There was a problem hiding this comment.
Valid values shouldn't be placeholder values. I don't see why valid values like this should be skipped? Can't you use None (an invalid pose) as a placeholder?
There was a problem hiding this comment.
this is a null/identity quaternion, famous placeholder value actually, but yeah those actually shouldn't be in the dataset, this was an ad-hoc cleanup for some datasets, will be removed later
- dimos map global / replay: new --bottom-cutoff option, threaded to PointCloud2.to_rerun(bottom_cutoff=) for global/accumulated maps only (raw lidar frames untouched); e.g. --bottom-cutoff 0 strips the floor. - rename/summary/replay/replay-marker: usage docs now reference the `dimos map <verb>` commands instead of `python -m ...cli.<mod>`. - observation: drop redundant explicit _data_lock=Lock() in with_pose/tag/ derive; __init__ creates a fresh per-instance lock when None.
| @@ -208,7 +208,21 @@ def tag(self, **tags: Any) -> Self: | |||
| tags={**self.tags, **tags}, | |||
| _data=_UNLOADED, | |||
| _loader=lambda: self.data, | |||
| _data_lock=threading.Lock(), | |||
| ) | |||
| return type(self)(**kwargs) | |||
|
|
|||
| def with_pose(self, pose: Any) -> Self: | |||
| """Return a new observation with ``pose`` attached, payload kept lazy. | |||
|
|
|||
| ``pose`` accepts anything :func:`_to_tuple` handles (a 3-/7-tuple, | |||
| ``Pose``/``PoseStamped``/``Transform``, or ``None`` to clear). | |||
| """ | |||
| kwargs: dict[str, Any] = {f.name: getattr(self, f.name) for f in fields(self)} | |||
| kwargs.pop("pose_tuple", None) | |||
| kwargs.update( | |||
| pose=pose, | |||
| _data=_UNLOADED, | |||
| _loader=lambda: self.data, | |||
| ) | |||
| return type(self)(**kwargs) | |||
There was a problem hiding this comment.
Shared
_data_lock deadlocks on first lazy data access
Both tag() and the new with_pose() pass the source observation's _data_lock through to the derived observation (via the fields(self) dict copy). Both also set _data=_UNLOADED with _loader=lambda: self.data. When the derived observation's .data is accessed while the source's data is still unloaded:
derived.dataacquiressource._data_lock- Calls the loader:
lambda: source.data source.datachecks_data(still_UNLOADED), then trieswith source._data_lock:— same lock, same thread, non-reentrant → deadlock (hangs forever)
The old tag() prevented this by explicitly passing _data_lock=threading.Lock() in its kwargs.update(...). That line was removed in this PR, regressing tag(). The new with_pose() has the same omission from the start.
This is exercised directly by pose_fill._fill, which calls primary.with_pose(secondary.data) on lazily-loaded stream observations; when the result is later decoded (e.g. by DetectMarkers), it deadlocks. The new test test_attaches_pose_and_keeps_payload_lazy also hits this path and will hang at assert posed.data == "payload".
Fix: add _data_lock=threading.Lock() to the kwargs.update(...) call in both tag() and with_pose().
The entry point is `dimos map <verb>` (self-documented via --help); don't document running the modules directly. summary.py keeps its rewritten usage line per reviewer suggestion.
dimos maputilities for dataset collection teamyou can ignore those, they are temporary,
changing pose source for observation stream, rendering global map, loop closures etc. lots of tooling (temporary and will change a lot)
dataset validation instructions document
stream alignment
a very simple stream.align tool - this needs much more work. multi stream input and output, interpolation of poses/transforms etc