Skip to content

Per-upload cost grows with bucket size (forest write amplification): 1.0s/file at n=0 to 3.8s/file at n=135 #34

@ehsan6sha

Description

@ehsan6sha

Context

Found during FxFiles web P0 cache-plan baselining (FxFiles commit f3e4ed0, plan doc FxFiles/docs/web-listing-prefetch-cache-plan.md section 11). Not blocking that work - filing so it can be picked up independently.

Observation

Seeding a fresh documents-v8 bucket with 150 sequential uploadObject calls (4 KB each, distinct keys under /e2e/perf/, fula_client 0.6.7 wasm via FRB, single client, no concurrency) showed the cumulative average upload time rising steadily as the bucket filled:

files uploaded avg ms/file (cumulative)
25 1040
50 1786
75 2352
100 3041
125 3818

A rising cumulative average means the marginal cost per upload grows roughly linearly with object count - i.e. bulk-uploading N files costs O(N^2) total. Payload size is constant (4 KB), so the growth is in the per-put metadata work, not data transfer.

Hypothesis (unverified)

Each put rewrites/re-uploads forest index state whose size scales with the bucket's object count (forest pages / root chain), rather than an incremental page touch. Likely places to look:

  • crates/fula-flutter/src/api/forest.rs (put path takes the outer write().await lock; what does it serialize/upload per put?)
  • forest page layout: does a put rewrite one leaf page + root, or re-serialize a larger structure?
  • whether small sequential puts could batch/amortize forest commits (e.g. dirty-page flush, or an explicit batch-put API for N files)

Repro

  1. FxFiles repo at f3e4ed0: flutter build web --release -t lib/main_web.dart --dart-define=E2E=true --dart-define=PERF=true --pwa-strategy=none
  2. Serve build/web, open http://localhost:<port>/?e2e=perf-seed&n=150&seed=<test vault seed> in headless Chrome with --enable-logging=stderr
  3. Watch the [e2e] seeded k/150 (...ms/file) console lines.

Same trend should reproduce with any S3-compatible client of fula-api writing N small objects sequentially to one bucket; the FxFiles harness is just a convenient driver.

Why it matters

  • Bulk flows (camera-roll sync, folder upload, collaboration receiver-upload) degrade quadratically with bucket size.
  • It also inflates read-side forest size/cost (FxFiles web measured cold listing at ~6.5 ms/file - forest download + first enumeration - which is the other half of the same scaling story).

Suggested acceptance

  • Instrument the put path to attribute time (forest read-modify-write vs PUT vs lock wait).
  • Decide: incremental page write, batched commit API, or documented expected cost.
  • A 150-small-file sequential upload should show flat (or near-flat) marginal cost.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions