Skip to content

fix: #34 forest write amplification - seal persisted HAMT nodes (0.6.8)#35

Merged
ehsan6sha merged 4 commits into
mainfrom
fix/34-forest-write-amplification
Jun 12, 2026
Merged

fix: #34 forest write amplification - seal persisted HAMT nodes (0.6.8)#35
ehsan6sha merged 4 commits into
mainfrom
fix/34-forest-write-amplification

Conversation

@ehsan6sha

Copy link
Copy Markdown
Member

Fixes #34 — per-upload cost grew with bucket size because every forest flush re-uploaded the entire ever-touched portion of the hot shard's HAMT instead of just the changed path. Sequential bulk uploads cost O(N²) total index traffic (measured pre-fix: 1,630 node PUTs / 12.2 MB to store 150 × 4 KB files; FxFiles web measured per-upload latency growing 1.0 s → 3.8 s as the bucket filled).

Root cause

  1. Node::set_value (copy-on-write insert) flips every node on the mutated root→leaf path to ChildPtr::InMemory.
  2. Node::store(&self) persisted every InMemory child but — taking &self — could never mark it clean afterwards; the only constructor of Stored/StoredV2 was the load path (Pointer::from_wire).
  3. So in a long-lived session the InMemory set grew monotonically and every flush re-serialized, re-encrypted (fresh nonce → not even dedupe-able), and re-PUT the whole accumulated tree. All files in one directory route to one shard (dir-local routing), and put_object_flat flushes per call — the FxFiles bulk pattern hit the worst case exactly.

The fix

  • New ChildPtr::Sealed { storage_key, cid, node } — persisted and still memory-resident. Reads resolve from the resident Arc with zero I/O (exactly like InMemory, so no read-path regression — this tree has no read-through cache); store paths emit the recorded Link/LinkV2 reference with zero I/O.
  • New Node::store_sealing(&mut Arc<Self>) — like store, but write-backs each successfully persisted InMemory child as Sealed. On a child PUT failure the child is restored to InMemory and the error propagates: the in-memory tree stays intact, and a retry re-persists exactly the unsealed remainder, converging on the identical content address (unit-tested with injected failures, exact PUT arithmetic asserted).
  • flush_dirty seals (shard write-guard instead of read). A flush now PUTs the shard root plus the path(s) mutated since the previous flush — bounded by tree depth, never by bucket size.
  • Mutations (set_value / remove_value / canonicalize) resolve the sealed Arc, copy-on-write, and re-attach as InMemory — unsealing exactly the touched path.

No data corruption — by construction and by test

  • Zero wire-format change. Sealed is never serialized as a new variant: it re-emits the same PointerWire::Link/LinkV2 its original PUT produced. Content addressing is plaintext-derived, so the sealing writer and the legacy writer produce byte-identical persisted state — pinned by issue_34_sealing_and_legacy_store_produce_identical_roots (same inserts → identical root content address via both paths). Existing buckets read/write exactly as before; pre-fix SDKs interoperate unchanged.
  • 150-file integrity: after 150 sealed flushes, a cold session over the final manifest reads every file (metadata included) and lists the directory completely.
  • Deletion over a sealed tree: cold reader sees the deletion and all 39 survivors.
  • Failure mid-seal: tree fully readable, retry converges on the never-failed root, and re-PUTs everything except exactly the children sealed before the failure.

Multi-client safety (two clients, one writes then the other)

New integration test two_clients_alternating_writes_compose_without_loss against a stateful mock master honoring If-None-Match: * / If-Match (412) — the production conflict contract:

  1. Client A writes 3 files (put_object_flat, flush per put → A fully sealed).
  2. Client B (same user key, fresh instance) writes 3 files, advancing the manifest/page ETags past A's cached view.
  3. Client A writes a 4th file from its stale sealed forest → the flush hits 412, evicts the sealed state wholesale, reloads B's winning state, replays its WAL, retries to success (the pre-existing NEW-7.2 reconcile, with sealing in play). The test asserts the 412/backoff path actually fired (flush_backoff_count), so it can't pass vacuously.
  4. A reads B's writes post-reconcile; a third fresh client reads all 7 files byte-exactly and lists exactly 7.

Sealing adds no staleness window: conflict arbitration was always manifest-ETag-gated, and every reconcile path discards the in-memory forest (sealed state included) wholesale. Sealed pointers reference content-addressed, immutable node objects, so another client's writes can never invalidate them in place. An in-flight reader holding a pre-flush Arc is likewise unaffected (Arc::make_mut copy-on-write; pinned by issue_34_in_flight_reader_arc_unaffected_by_sealing).

Server-side GC/pinning cannot depend on re-PUT recency (a bucket idle for months already has nodes PUT exactly once), so skipping redundant re-PUTs is safe on that axis.

Measured result (same 150-file workload, counting backend)

metric pre-fix (pinned at 2ed7fe0) post-fix
node PUTs at flush #150 17, growing with N 3, flat
bytes at flush #150 161.7 KB (≈1.08 KB × bucket size) 23 KB, bounded by path
long-lived vs fresh session marginal flush 17 vs 3 3 vs 3
totals for 150 × 4 KB files 1,630 PUTs / 12.2 MB 384 PUTs / 3.5 MB

Acceptance from the issue — "a 150-small-file sequential upload should show flat (or near-flat) marginal cost" — is met and enforced by inverted regression guards (issue_34_per_flush_put_cost_stays_flat_as_bucket_fills, issue_34_long_session_marginal_flush_matches_fresh_session).

All platforms

The change lives entirely in the shared fula-crypto core (both cfg branches), so Android, iOS, Windows, web/wasm and fula-js all pick it up:

  • cargo test -p fula-crypto449 passed (incl. 9 issue-34 guards)
  • cargo test -p fula-client --features test-fault-injection208 lib + all integration suites passed (incl. the new two-client interop test)
  • cargo check -p fula-crypto --target wasm32-unknown-unknown --no-default-features --features wasm — clean
  • cargo check -p fula-js --target wasm32-unknown-unknown — clean
  • cargo check -p fula-flutter / -p fula-client (native) — clean
  • (Android/iOS cross-compile needs the NDK/Xcode toolchains from release CI; the fix compiles the identical non-wasm cfg branch validated on Windows.)

Version

0.6.7 → 0.6.8 (workspace, fula-js + fula-flutter crates, fula_client pubspec/podspec, changelog). The fula-js / fula-wasm npm packages follow their own 0.2.x line and are untouched.

🤖 Generated with Claude Code

ehsan6sha and others added 4 commits June 12, 2026 12:18
…l puts

Three current-behaviour pins quantifying the O(N^2) bulk-upload cost:

- node layer: store() re-PUTs an entire unchanged InMemory tree because
  persisted children are never downgraded to Stored (the only Stored
  constructor is the load path, Pointer::from_wire)
- forest layer: per-flush PUT count and bytes grow with bucket size for
  the put_object_flat pattern (upsert + flush per file, one directory):
  flush #10 = 1 PUT / 10.8 KB -> flush #150 = 17 PUTs / 161.7 KB;
  150 x 4 KB files = 1630 node PUTs / 12.2 MB of index re-uploads
- contrast: marginal flush after 150 uploads is 17 PUTs in the
  long-lived session vs 3 PUTs for a fresh forest loaded from the same
  manifest - the amplification is session-accumulated InMemory state,
  not inherent tree cost

Invert these pins into regression guards when #34 is fixed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ding the whole tree

Root cause: Node::store(&self) persisted every InMemory child but could
never mark it clean (the only Stored constructor was the load path), so
a long-lived session re-uploaded the entire ever-touched shard tree on
EVERY flush - O(N^2) total index traffic for N sequential puts into one
directory (measured pre-fix: 1,630 node PUTs / 12.2 MB for 150 x 4 KB
files; flush #150 alone re-PUT 17 nodes / 161.7 KB).

The fix:

- New ChildPtr::Sealed { storage_key, cid, node }: persisted AND still
  memory-resident. Reads resolve from memory exactly like InMemory
  (zero fetches); store paths emit the recorded Link/LinkV2 reference
  with zero I/O. Never serialized as a new wire variant - persisted
  bytes are identical to the pre-fix writer's, so existing buckets and
  older SDKs interoperate unchanged (no data-format change, no
  corruption risk).
- New Node::store_sealing(&mut Arc<Self>): like store(), but write-backs
  each successfully-persisted InMemory child as Sealed. On a child
  failure the child is restored to InMemory and the error propagates -
  the in-memory tree stays intact and a retry re-persists exactly the
  unsealed remainder (converging on the same content address).
- flush_dirty now persists via store_sealing (shard write-guard instead
  of read) - a flush PUTs the shard root plus the path(s) mutated since
  the previous flush, bounded by tree depth, never by bucket size.
- Mutations (set_value / remove_value / canonicalize) resolve Sealed
  children from memory and re-attach the changed subtree as InMemory,
  unsealing exactly the touched path.

Measured post-fix (same 150-file workload): 384 PUTs / 3.5 MB total;
steady-state flushes are 2-3 node PUTs (~16-31 KB) at ANY bucket size;
a long-lived session's marginal flush now equals a fresh session's
(3 vs 3 node PUTs; pre-fix 17 vs 3).

Cross-platform: the change is entirely in the shared fula-crypto core
(cfg-split native/wasm paths both compile; fula-js wasm32 + fula-flutter
+ fula-client all check/test green), so Android, iOS, Windows, web/wasm
and fula-js all pick it up.

Tests: the three #34 pins are inverted into regression guards
(flat per-flush cost; long-session == fresh-session marginal cost), plus
new guards for sealing correctness: legacy-vs-sealing byte-identical
roots, full 150-file integrity through a cold reload, sealed-tree
deletion, second-store root-only re-PUT, and failure-mid-seal
intact-and-retryable.

Fixes #34

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Workspace + fula-js + fula-flutter crate versions, fula_client pubspec/
podspec, changelog entry. The fula-js / fula-wasm npm packages follow
their own 0.2.x version line and are untouched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…led flush

Addresses the two-clients-alternating-writes concern directly:

- New fula-client integration test (two_clients_alternating_writes_
  compose_without_loss) against the stateful conditional-PUT mock
  master: client A writes 3 files (forest fully sealed), client B
  writes 3 files (advancing the manifest/page ETags), then A writes a
  4th file FROM ITS STALE SEALED FOREST. A must hit 412, evict the
  sealed state wholesale, reload B's winning state, replay its WAL,
  and retry to success. Asserts the 412/backoff path actually fired
  (flush_backoff_count metric - non-vacuous by construction), that A
  reads B's writes post-reconcile, and that a third fresh client reads
  all 7 files byte-exactly with a 7-entry forest listing.

- issue_34_in_flight_reader_arc_unaffected_by_sealing: a reader
  holding the shard-root Arc from BEFORE a flush keeps a fully-readable
  pre-seal view (Arc::make_mut copy-on-write) - the transient
  placeholder inside store_sealing is never observable.

- Cross-session composition added to the long-vs-fresh guard: a third
  cold session reads writes made by both prior sessions.

- Failure-retry assertion tightened to exact arithmetic: retry re-PUTs
  everything except exactly the 2 children sealed before the injected
  failure.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-upload cost grows with bucket size (forest write amplification): 1.0s/file at n=0 to 3.8s/file at n=135

1 participant