dimos map tooling, stream alignment by leshy · Pull Request #2306 · dimensionalOS/dimos

leshy · 2026-05-29T12:31:03Z

`dimos map` utilities for dataset collection team

you can ignore those, they are temporary,

changing pose source for observation stream, rendering global map, loop closures etc. lots of tooling (temporary and will change a lot)

dataset validation instructions document

stream alignment

a very simple stream.align tool - this needs much more work. multi stream input and output, interpolation of poses/transforms etc

Replaces the monolithic pgo_then_voxels with four primitives over mem2 Streams and Transforms: pgo_keyframes(lidar) -> Stream[Keyframe] keyframes_to_corrections(kfs) -> Stream[Transform] (world_corrected <- world_raw) make_interpolator(corrections) -> (ts) -> Transform (SLERP + linear, endpoint-clipped) apply_corrections(stream, corr) -> Stream[T] (shuffles obs.pose) Drift correction is now a first-class, reusable Stream[Transform] that any pose-stamped consumer can apply — same math as before (rigid T_corr = T_global @ T_local^-1 per keyframe, SLERP + linear interpolation between bracketing keyframes), just composable. dimos map --pgo migrates to the new primitives; pgo_then_voxels is deleted. The internal _SimplePGO / PGOConfig / _KeyPose machinery stays in pgo.py and is imported by pgo2.

Renames: pgo.py -> pgo_internals.py (gtsam/ICP machinery) pgo2.py -> pgo.py (public Stream-shaped API) `dimos.mapping.relocalization.pgo` is now the canonical import path (`pgo_keyframes`, `keyframes_to_corrections`, `make_interpolator`, `apply_corrections`, `correction_at`, `Keyframe`). The internal _SimplePGO / PGOConfig / _KeyPose / _icp / _voxel_downsample helpers live in pgo_internals.py and are imported lazily inside pgo.py to keep gtsam off the public-API import path.

Adds test_pgo.py covering pose normalization on Observation, the Transform↔Pose3 conversion helpers, the interpolator edge cases, apply_corrections behavior, and a real-recording smoke test (skipped when data/go2_short.db is absent). Annotates the two map.py helpers so they pass mypy. Removes leftover section divider comments to satisfy the no-section-markers project rule.

The colors-copy block in `PointCloud2.transform` calls `self.pointcloud` to check `has_colors()`, which forces a tensor->legacy conversion on every invocation. With `pgo --full-pgo` rebuilding from hundreds of lidar frames, that hidden allocation dominates the per-frame cost. The colors path was lidar-irrelevant (lidar clouds have no colors anyway) and the feature it was meant to support never landed; remove it so transform() stays a clean numpy round-trip.

…s into feat/ivan/pgo_rewrite

These end-to-end CLI tests forked python -m dimos.robot.cli.dimos per case, re-paying the heavy per-verb imports (rerun, open3d, voxel) 7 times. Run them in-process via Typer's CliRunner so those imports are paid once: 33.7s -> 7.5s, all 7 still pass. Trade-off: shared interpreter, no per-test process isolation.

Add from_time/to_time (relative to the first observation) and from_timestamp/to_timestamp (absolute epoch seconds) for windowing a stream by time. A trailing to_time is a duration measured from the current start, so from_time(2).to_time(30) reads as "skip 2s, take the following 30s"; frames mix freely (from_timestamp(ts).to_time(30)). Shared base for the stream-alignment (#2306) and go2dds (#2314) branches, which both need this windowing API.

…arch - pgo_keyframes: widen stream param to Iterable[Observation[PointCloud2]] (clears the mypy gate; it never uses Stream methods) - pose_fill_db: order_by("ts") on both align inputs — align requires ts-ascending iteration, but sqlite defaults to id ASC (insertion order), which would silently mis-pair out-of-order recordings - add CI-running unit tests for Observation.with_pose (lazy payload kept) and pose_fill(mount=...) composition - remove the unreferenced autoresearch/ research drop (near-duplicate of pgo_auto.py + scaffolding); pgo_auto.py kept

Add from_time/to_time (relative to the first observation) and from_timestamp/to_timestamp (absolute epoch seconds) for windowing a stream by time. A trailing to_time is a duration measured from the current start, so from_time(2).to_time(30) reads as "skip 2s, take the following 30s"; frames mix freely (from_timestamp(ts).to_time(30)). Shared base for the stream-alignment (#2306) and go2dds (#2314) branches, which both need this windowing API.

Pulls the stairs dataset LFS pointer from feat/ivan/go2stairs so the map tooling can use it; object already on remote LFS store.

…nment # Conflicts: # dimos/memory2/stream.py # dimos/memory2/test_stream.py

The self-hosted CI image's published :dev tag is stale (predates libturbojpeg0-dev in docker/python/Dockerfile), so TurboJPEG() raises at runtime and the four verbs that decode the color_image stream fail. Guard them with skipif so they skip on a lib-less image and still run where the native lib is present.

…nment

paul-nechifor · 2026-06-01T22:12:10Z

+    if no_gui:
+        print(f"open with: rerun {out}")
+    else:
+        subprocess.Popen(["rerun", str(out)])


You maintain to reference to this? How is it ever closed?

wait I don't get it - reference to what?

paul-nechifor · 2026-06-01T22:16:59Z

+        --out mid360_renamed.db \\
+        --rename go2_lidar=lidar \\
+        --rename lidar=fastlio_lidar \\
+        --rename odometry=fastlio_odometry


This is meant to be used through dimos map rename, no? Then please don't document running the file itself.

paul-nechifor · 2026-06-01T22:35:22Z

+
+    for p in primary_iter:
+        # Advance the cursor until `nxt` sits past `p`; `prev` trails just behind.
+        while nxt is not None and nxt.ts <= p.ts:


Why <=? If nxt.ts == p.ts isn't that the best possible solution?

paul-nechifor · 2026-06-01T22:54:07Z

+        # Skip placeholder poses (origin position OR zero quaternion).
+        if pose[0] == 0 and pose[1] == 0 and pose[2] == 0:
+            continue
+        if pose[3] == 0 and pose[4] == 0 and pose[5] == 0 and (pose[6] == 0 or pose[6] == 1):
+            continue


Valid values shouldn't be placeholder values. I don't see why valid values like this should be skipped? Can't you use None (an invalid pose) as a placeholder?

this is a null/identity quaternion, famous placeholder value actually, but yeah those actually shouldn't be in the dataset, this was an ad-hoc cleanup for some datasets, will be removed later

- dimos map global / replay: new --bottom-cutoff option, threaded to PointCloud2.to_rerun(bottom_cutoff=) for global/accumulated maps only (raw lidar frames untouched); e.g. --bottom-cutoff 0 strips the floor. - rename/summary/replay/replay-marker: usage docs now reference the `dimos map <verb>` commands instead of `python -m ...cli.<mod>`. - observation: drop redundant explicit _data_lock=Lock() in with_pose/tag/ derive; __init__ creates a fresh per-instance lock when None.

…nment

greptile-apps · 2026-06-02T10:44:24Z

@@ -208,7 +208,21 @@ def tag(self, **tags: Any) -> Self:
            tags={**self.tags, **tags},
            _data=_UNLOADED,
            _loader=lambda: self.data,
-            _data_lock=threading.Lock(),
+        )
+        return type(self)(**kwargs)
+
+    def with_pose(self, pose: Any) -> Self:
+        """Return a new observation with ``pose`` attached, payload kept lazy.
+
+        ``pose`` accepts anything :func:`_to_tuple` handles (a 3-/7-tuple,
+        ``Pose``/``PoseStamped``/``Transform``, or ``None`` to clear).
+        """
+        kwargs: dict[str, Any] = {f.name: getattr(self, f.name) for f in fields(self)}
+        kwargs.pop("pose_tuple", None)
+        kwargs.update(
+            pose=pose,
+            _data=_UNLOADED,
+            _loader=lambda: self.data,
        )
        return type(self)(**kwargs)


Shared _data_lock deadlocks on first lazy data access

Both tag() and the new with_pose() pass the source observation's _data_lock through to the derived observation (via the fields(self) dict copy). Both also set _data=_UNLOADED with _loader=lambda: self.data. When the derived observation's .data is accessed while the source's data is still unloaded:

derived.data acquires source._data_lock

Calls the loader: lambda: source.data

source.data checks _data (still _UNLOADED), then tries with source._data_lock: — same lock, same thread, non-reentrant → deadlock (hangs forever)

The old tag() prevented this by explicitly passing _data_lock=threading.Lock() in its kwargs.update(...). That line was removed in this PR, regressing tag(). The new with_pose() has the same omission from the start.

This is exercised directly by pose_fill._fill, which calls primary.with_pose(secondary.data) on lazily-loaded stream observations; when the result is later decoded (e.g. by DetectMarkers), it deadlocks. The new test test_attaches_pose_and_keeps_payload_lazy also hits this path and will hang at assert posed.data == "payload".

Fix: add _data_lock=threading.Lock() to the kwargs.update(...) call in both tag() and with_pose().

The entry point is `dimos map <verb>` (self-documented via --help); don't document running the modules directly. summary.py keeps its rewritten usage line per reviewer suggestion.

leshy added 30 commits May 24, 2026 18:53

map tool first pass

ceb08ba

map tool wrap

4e0cec3

renamed rerun blueprint to dimos map tool

2099e00

readme cleanup

47848d4

pgo rewrite

25cf609

pgo cleanup

45e72da

visualizing loop closures

df4435f

pgo vis fixed

ebdf1ec

much faster pgo

71762c1

Merge branch 'main' into feat/ivan/pgo_rewrite

74a5011

ci: install gtsam-extended in tests group so import gtsam works

626a4fc

Merge branch 'main' into feat/ivan/pgo_rewrite

43c0453

markers on dimos map

2f46b38

map detects markers correctly

82e2dab

maptool speed limits

b98b506

Merge branch 'feat/ivan/pgo_rewrite' of github.com:dimensionalOS/dimo…

c1115d8

…s into feat/ivan/pgo_rewrite

markers_rrd renderer

4d5e9b6

average window for markers

b3163bc

sliding window 7.5s

e529913

detector tuned

5751d48

nicer map

dcd7f7b

marker eval first pass

a6d4dcc

extracted loop closure code from relocalization code

89eb996

Merge branch 'autoresearch/ivan/pgo_go2' into feat/ivan/pgo_rewrite

22ff976

better eval

186a4be

Merge branch 'autoresearch/ivan/pgo_go2' into feat/ivan/pgo_rewrite

0d8b277

greptile-apps Bot reviewed May 30, 2026

View reviewed changes

Comment thread dimos/memory2/stream.py Outdated

leshy changed the title ~~Feat/ivan/stream alignment~~ dimos map tooling, stream alignment, autoresearch sketch Jun 1, 2026

leshy added the PlzReview label Jun 1, 2026

leshy mentioned this pull request Jun 1, 2026

GO2 MCAP/DDS integration as memory2 backend #2314

Open

leshy mentioned this pull request Jun 1, 2026

memory2: time windowing on Stream #2315

Merged

cleanup

501f3e3

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread dimos/memory2/stream.py Outdated

leshy changed the title ~~dimos map tooling, stream alignment, autoresearch sketch~~ dimos map tooling, stream alignment Jun 1, 2026

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread dimos/robot/cli/dimos.py

data: add go2_mid360_stairs recording from go2stairs branch

c70598b

Pulls the stairs dataset LFS pointer from feat/ivan/go2stairs so the map tooling can use it; object already on remote LFS store.

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread dimos/mapping/loop_closure/pgo_auto.py

Merge remote-tracking branch 'origin/main' into feat/ivan/stream_alig…

ffb412e

…nment # Conflicts: # dimos/memory2/stream.py # dimos/memory2/test_stream.py

leshy mentioned this pull request Jun 1, 2026

Feat/ivan/go2stairs #2283

Closed

leshy added 2 commits June 1, 2026 20:51

Merge remote-tracking branch 'origin/main' into feat/ivan/stream_alig…

ae17740

…nment

paul-nechifor reviewed Jun 1, 2026

View reviewed changes

Comment thread dimos/memory2/type/observation.py Outdated

paul-nechifor reviewed Jun 1, 2026

View reviewed changes

Comment thread dimos/mapping/utils/cli/summary.py Outdated

paul-nechifor reviewed Jun 1, 2026

View reviewed changes

leshy added 2 commits June 2, 2026 16:58

Merge remote-tracking branch 'origin/main' into feat/ivan/stream_alig…

36a6b43

…nment

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

leshy added 2 commits June 2, 2026 18:46

map cli: drop Usage blocks from rename/replay/replay-marker docstrings

35641ea

The entry point is `dimos map <verb>` (self-documented via --help); don't document running the modules directly. summary.py keeps its rewritten usage line per reviewer suggestion.

add stream freq in summary

b8a95cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dimos map tooling, stream alignment#2306

dimos map tooling, stream alignment#2306
leshy wants to merge 81 commits into
mainfrom
feat/ivan/stream_alignment

leshy commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paul-nechifor Jun 1, 2026

Uh oh!

leshy Jun 2, 2026

Uh oh!

paul-nechifor Jun 1, 2026

Uh oh!

paul-nechifor Jun 1, 2026

Uh oh!

Uh oh!

Uh oh!

paul-nechifor Jun 1, 2026

Uh oh!

leshy Jun 2, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leshy commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dimos map utilities for dataset collection team

stream alignment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paul-nechifor Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

leshy Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paul-nechifor Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

leshy Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leshy commented May 29, 2026 •

edited

Loading

`dimos map` utilities for dataset collection team

leshy Jun 2, 2026 •

edited

Loading