Skip to content

Bump zwasm v1.9.0 → v1.9.1#4

Merged
chaploud merged 1 commit into
mainfrom
develop/bump-zwasm-v1.9.1
Apr 24, 2026
Merged

Bump zwasm v1.9.0 → v1.9.1#4
chaploud merged 1 commit into
mainfrom
develop/bump-zwasm-v1.9.1

Conversation

@chaploud
Copy link
Copy Markdown
Contributor

Summary

Test plan

  • zig build — clean build green
  • bash test/run_all.sh --quick — 4/4 pass (zig build test, cljw test 83 namespaces, e2e wasm, deps.edn e2e)

@chaploud chaploud merged commit 995d468 into main Apr 24, 2026
6 checks passed
@chaploud chaploud deleted the develop/bump-zwasm-v1.9.1 branch April 24, 2026 09:37
chaploud added a commit that referenced this pull request Apr 26, 2026
Audit findings against private/2026-04-27_strategic_review/ and v1 /
prior-redesign reference repos surfaced 11 missing pieces. This commit
adds them so Phase 1 onward does not silently drift.

Added:
- .claude/rules/{zone_deps,zig_tips,compat_tiers}.md  (path-matched auto-load)
- .dev/decisions/{README.md, 0000-template.md}        (ADR infrastructure)
- .dev/handover.md                                    (session-to-session memo)
- .dev/known_issues.md                                (P0-P3 debt log)
- .dev/compat_tiers.yaml                              (per-namespace tier source of truth)
- .dev/concurrency_design.md                          (pre-Phase-15 deep dive)
- .dev/wasm_strategy.md                               (pre-Phase-19 deep dive; adopts hybrid)
- scripts/zone_check.sh                               (info / --strict / --gate; works on empty src)
- test/run_all.sh                                     (single test entry point)

Updated:
- .dev/ROADMAP.md: new §11.6 Quality gate timeline (16 gates, active + planned),
  added new files to §15.1, removed .editorconfig from §5, revision entry.

Removed:
- .editorconfig: project owner uses Emacs; format will be wired as a
  pre-commit gate later (listed as gate #4 in ROADMAP §11.6).
chaploud added a commit that referenced this pull request May 30, 2026
Smell-audited: 1: structural-defect fix (representation divergence class
#4) — (= (->Point 1 2) (->Point 1 2)) silently returned false. equal.zig
had no .typed_instance arm so records fell to else=>false / bit-hash.

Adds kind-gated .typed_instance arms to valueEqual + keyEqValue +
valueHash: defrecord compares same descriptor + all declared fields
(recursively); deftype keeps identity (no auto equals). A record is never
= to a plain map (same-tag gate). The 3 arms stay mutually consistent so
equal records share a hash bucket — usable as map keys ((get {rec :a}
rec)). Found via the structural-defect probe sweep; 5 e2e cases.
chaploud added a commit that referenced this pull request Jun 4, 2026
…090 §2)

Smell-audited: 1: Phase B implementation increment #2 per ADR-0090 §2. Added gc_mutex (std.Io.Mutex) to GcHeap; lock alloc/pin/unpin (gc_heap.zig) + the whole collect() cycle (mark_sweep.zig) via the io_default singleton (the allocator API takes no io arg). Makes allocation thread-safe under F-006 — the foundation the #3 ThreadGcContext root-publication handshake builds on for collection safety. Not reentrant (alloc never calls collect; collect never allocates). Uncontended + runtime-inert today (single-threaded; real threads land at #4 future/pmap), so no observable behaviour change. New concurrency test: 4 threads x 500 allocs through a threaded io serialize race-free (allocations.len == alloc_count == 2000). Full --serial-e2e gate green 247/0; the io_default-default-single-threaded gc tests still pass (uncontended lock). bench staged per source-bearing policy (also absorbs the session's dangling doc-commit gate samples). Stale 'lock deferred to Phase B' docstring updated to describe the landed lock.
chaploud added a commit that referenced this pull request Jun 4, 2026
…locks #3; re-analysis gated (D-244)

Smell-audited: 2: Bad-Smell interrupt surfaced while designing increment #3. root_set.zig roots ns_vars/current_frame/macro_root_slot/permanent_roots but NOT the VM operand stack (vm.zig local Value array) nor tree_walk native-stack intermediates; safe today only because collect() runs at quiescent explicit points (no auto-collect). For Phase B real threads (#4), a mid-eval worker's operand/native-stack Values are un-rooted -> concurrent collect UAF; plus a pushFrame/popFrame read-during-write race during another thread's root walk. So ADR-0090 §2 Alt-2's 'no safepoint needed' is insufficient for mid-eval workers. Recorded as ADR-0090 Revision history + D-244 (the #3 gating design step): re-analyse with a DA-fork (safepoint Alt-1 vs publish-VM-operand-stack-root + forbid-tree_walk-during-collect) BEFORE the handshake code. The §1-2/§5-7 spine + increments #1/#2 are unaffected (the alloc lock is needed by either mechanism).
chaploud added a commit that referenced this pull request Jun 4, 2026
…worker-only register, fold-not-11th-source)

Worked out the #3a implementation design (the delicate GC-root-walker rewire): registry lives IN root_set.zig (a separate gc_thread.zig would cycle via macro_root_slot); ThreadGcContext = {frame_slot, macro_slot} pointers to a worker's TLS; only worker threads register (main reads own TLS directly -> existing single-thread tests stay green, empty registry = current behaviour); FOLD the registry pass into the current_frame/macro_root_slot cursors rather than add an 11th RootSource (the 10-source count is asserted + ADR-0028 §5). #3b (safepoint + per-eval-frame operand-stack publication) couples to #4. Captured as the impl checklist so the most-correctness-critical code proceeds from a complete design.
chaploud added a commit that referenced this pull request Jun 4, 2026
…#3a-step2, ADR-0090 D-244)

Smell-audited: 1: completes #3a per the D-244 Alt-B checklist. nextCurrentFrame + nextMacroRoot now walk a UNION of root sources: index 0 = this (collecting) thread's TLS (current_frame / macro_root_slot, read directly — unchanged behaviour), index k>=1 = registered worker k-1's published TLS (via frameSourceAt/macroSourceAt over the ThreadGcContext registry). Cursors gained src_idx/primed (replacing initialised/consumed); a FOLD into the existing cursors, NOT an 11th RootSource (preserves the ADR-0028 §5 10-source contract + its count test). Runtime-inert today (empty registry -> union == self -> existing single-thread tests pass unchanged). New test: a registered context pointing at a separate frame chain + macro slot is walked alongside self (proves the union reaches source >=1). Full --serial-e2e gate green 247/0. #3b (the alloc-boundary safepoint + per-eval-frame operand-stack publication, coupled to #4 real threads) is the remaining handshake sub-step.
chaploud added a commit that referenced this pull request Jun 4, 2026
…a registry (D-244 robustness)

Smell-audited: 1: pre-#4 hardening for the ThreadGcContext registry that Phase-B workers will register into. 4 threads x 200 register+unregister cycles through a threaded io_default; asserts the io_default-locked fixed array is race-free (back to count 0, no stranded slot). Additive (new test only). zig build test green.
chaploud added a commit that referenced this pull request Jun 4, 2026
…or (Phase B #3b-step1 design)

Smell-audited: 3: new ADR for the depth-2 structural choice (operand-stack root
wiring). DA-fork rated Alt 2 (thread-roots union) finished-form-clean; the main
loop's Alt-1 (clean 11th source) instinct was the Smallest-diff/Cycle-budget
bias the DA names — overridden per F-002/F-011, adopted Alt 2. Reservation-as-
bias on the "10 sources" count + the #3a "fold don't amend" precedent confronted
(both memos, not contracts). DA output reflected verbatim.

ADR-0028 §5 gains amendment 2 (rows 2+7 subsumed into thread_roots; enum 10->9).
Implements ADR-0090 D-244 decision Alt B §3. #3b-step1 = publication infra
(runtime-inert); #3b-step2 (safepoint) couples to #4.
chaploud added a commit that referenced this pull request Jun 4, 2026
…r (Phase B #3b-step1, ADR-0091)

Smell-audited: 1: implements the depth-3 ADR-0091 structural decision (the
DA-fork lives in that prior commit). No new smell: threadContextAt commonizes
#3a's triplicated per-thread addressing (F-011); the "10 sources" count treated
as a memo (10->9); publication is finished-form, not a provisional/no-op.

ADR-0091 Alt 2: subsume current_frame + macro_root_slot into a thread-major
`thread_roots` cursor that ALSO walks each thread's VM operand-stack EvalFrame
chain (stack[0..sp] + locals). root_set.zig owns EvalFrame + the threadlocal
eval_frame_head; vm.eval publishes its {stack,sp,locals} frame per call
(push/defer-pop). ThreadGcContext gains eval_frame_slot; the union walk covers
self (TLS) + every registered worker. Runtime-inert: collect() runs only at
quiescent points today; the #3b-step2 alloc-boundary safepoint makes it fire
mid-eval for Phase-B workers (couples to #4). Tests: self / union / stack[0..sp]
boundary (never the undefined region) / eval-frame parent chain.
chaploud added a commit that referenced this pull request Jun 4, 2026
…s (Phase B #3b-step2a, ADR-0090 Alt B)

Smell-audited: 1: implements ADR-0090 Alt B's pause-the-mutators half (the
mechanism's DA-fork is ADR-0090's). No new smell: a separate sp_mutex (NOT
gc_mutex — a parked worker releases it while waiting) is the finished-form
layering, not a workaround; the vm.eval:107 back-edge poll line is deferred to
#4 where it fires + is e2e-testable (a hot-loop edit landed with its first use,
not a cycle-budget defer).

New concurrency/safepoint.zig: stopWorld(self_registered) arms gc_requested +
blocks until every other registered worker parks; park() is the worker safe
point (register parked, wake the collector, block on resume_cond until the flag
clears); resumeWorld() clears + broadcasts. Two Io.Condition + a separate
sp_mutex via io_default (pinned 0.16 Io.Condition has no timedWait → plain
waitUncancelable; liveness bounded by the poll discipline). Runtime-inert:
nothing arms gc_requested until #4's force-VM workers. Isolation tests with real
std.Threads: all-parked rendezvous + a parked worker's published EvalFrame
surviving a REAL mark_sweep.collect during STW (garbage swept, rooted retained).
main.zig aggregator import (lazy-decl-analysis reach). D-244 updated.
chaploud added a commit that referenced this pull request Jun 4, 2026
…b-walk (Phase B #3b-step2b, ADR-0090 Alt B)

Smell-audited: 1: implements ADR-0090 Alt B's self-guard within the decided
envelope; extends the Alt 2 thread-major cursor with a 4th per-thread sub-walk
(no new RootSource, enum stays 9) — validating ADR-0091's "extends, not
rewrites". Step 0.6 corrected the survey's "self-only" justification: a PARKED
worker mid-op_vector_literal also holds an un-published partial (it parks at its
own alloc entry INSIDE conj), so gc_self_guard is walked for self AND every
worker; a single slot suffices (Q1 ops assemble already-on-stack values, no
nested eval). Refuses cw v0's suppressCollection hack — publishes a precise root
(F-006 / F-011).

root_set.zig: gc_self_guard threadlocal + ThreadGcContext.self_guard_slot +
threadContextAt + thread_roots cursor self_guard phase (mirrors macro). Tests:
self / union (worker self-guard) / null-inert. safepoint.zig test ctxs updated
to 4 slots. ADR-0028 §5 amendment 2 + D-244 note the 4th sub-walk. Runtime-inert:
nothing sets gc_self_guard until #4's Q1 fabrication-site wire-up.
chaploud added a commit that referenced this pull request Jun 4, 2026
…t entry (Phase B #4a-alloc, ADR-0090 Alt B)

Smell-audited: 1: implements the #4-survey-recommended alloc park point + the STW
collect entry within ADR-0090 Alt B's decided envelope. The gc_heap<->safepoint
import cycle is function-level (no circular TYPE dep), all Layer 0, no zone
violation — Zig compiles it. auto-collect stays OFF (collect explicit/test-
triggered); the safepoint being wired makes ANY collect safe — a documented #4a'
staging (debt D-244), not a no-op (the survey resolved that the collect trigger
belongs at the VM safe point which has rt+env, NOT inside the envs-less alloc).

gc_heap.alloc gains a prologue `if (safepoint.gc_requested) park()` BEFORE
gc_mutex: an allocating worker must register parked on sp_mutex first, else a
worker blocked on gc_mutex is uncounted and stopWorld hangs. mark_sweep gains
collectStopTheWorld(gc, ctx, self_registered) = stopWorld -> collect ->
resumeWorld (caller must not hold gc_mutex; collect re-takes it, Io.Mutex is not
reentrant). Real-thread test: N workers allocating through gc.alloc park at the
prologue during a concurrent collectStopTheWorld + resume cleanly; + a
single-threaded fenced-collect test. Swept fake test Cells use the finaliser-free
.vector tag (.string's finaliser reads a data ptr the 16-byte fake Cell lacks).
chaploud added a commit that referenced this pull request Jun 4, 2026
…ase B #4b-poll, ADR-0090 Alt B)

Smell-audited: 1: completes the worker-park wiring within ADR-0090 Alt B. The
poll mirrors the tested alloc-prologue park (#4a); it is the non-allocating-loop
half (a worker spinning in (loop [i 0] (recur (inc i))) never allocates, so the
alloc park can't catch it — the back-edge poll does). VM-only by design (F-012:
tree_walk never runs on a worker), so no dual-backend parity obligation.

vm.eval gains `if (safepoint.gc_requested.load(.monotonic)) safepoint.park()` at
the top of the `while(true)` dispatch loop. Relaxed load (liveness only;
correctness fenced by park's acquire); one predicted-not-taken branch; inert
until a #4 worker arms gc_requested. With the alloc-prologue park (#4a) + this
poll, ANY worker reaches a safe point on a pending collect. Deterministic test:
a worker re-evaluating a trivial alloc-free chunk in a tight loop is continuously
in eval, so a stopWorld deterministically catches it parked at the poll +
resumes it (no synthetic-loop timing race).
chaploud added a commit that referenced this pull request Jun 5, 2026
…; multi-thread future-worker torture = D-244 #4 (hang/crash), separate
chaploud added a commit that referenced this pull request Jun 5, 2026
chaploud added a commit that referenced this pull request Jun 5, 2026
chaploud added a commit that referenced this pull request Jun 5, 2026
…ure-worker hang/crash)

Smell-audited: 2: torture is a test harness; a worker-initiated STW collect
is the genuinely-dormant D-244 #4 (self-deadlock + misses the unregistered
main's roots), recorded as debt — not papered over. Scoping torture to the
main thread tests what is testable now (main parks workers + walks the full
root set) and leaves the worker-initiated multi-thread collect as the
user-owned highest-risk path.

The vm.zig torture poll hardcoded collectStopTheWorld(.., false) so a
future/agent WORKER's back-edge poll triggered a worker-initiated STW:
stopWorld waited for the calling worker to park (self-deadlock, hang 124),
and the worker-thread collect walked only its own TLS + registered workers,
missing the unregistered main thread's roots (crash 134). New threadlocal
root_set.is_registered_worker (set in registerThread, cleared in
unregisterThread) gates the torture poll to the main thread, where the
collect parks the registered workers and walks the complete root set.

Verified torture-green: (future (reduce + (range 1 100)))=4950,
(mapv #(future (* %1 %1)) (range 1 5))=[1 4 9 16], (pmap inc (range 1 8)).
chaploud added a commit that referenced this pull request Jun 5, 2026
…ny-action drainer hang)

Smell-audited: 2: a real TOCTOU root cause, not a torture-only workaround —
discharges the named D-244 #4b-future-ii sub-item "stopWorld currentTarget
re-read (a worker exiting mid-STW must not hang it)" and resolves the D-253
agent residual. SSOT (.dev/gc_rooting.md E4) updated in the same commit so the
rooting-surface doc stays honest with the mechanism change.

An agent send/await under main-thread torture hung (exit 124): the drainer
runs a tiny action (inc) and UNREGISTERS before it ever reaches a safepoint
poll to park, but stopWorld snapshotted `target = registeredThreadCount()`
ONCE and waited for `parked_count` to reach it — a count the now-departed
worker can never satisfy. stopWorld now recomputes the target on every wake
from a lock-free `registered_count` (read under sp_mutex, so the leaving
worker's `noteWorkerLeft` -> all_parked broadcast can neither be lost nor
invert the registry_mutex/sp_mutex order). registerThread/unregisterThread
maintain the atomic count; the leaving worker wakes the collector after
decrementing it.

Verified: agent send/await/drain (=1/20/[1 2]/5) torture-clean, 15x stress
green, future/pmap unchanged; full gate 252/0.
chaploud added a commit that referenced this pull request Jun 5, 2026
…rent-deref torture deadlock)

Smell-audited: 2: a real worker-blocked-on-lock root cause, not a torture
workaround — the proper safepoint-transition mechanism, production-inert (only
a pending collect exercises it), bounded to the one eval-under-lock site. SSOT
(gc_rooting.md E6) + debt (D-250 tier-2 multi-thread-clean, D-244 #4) updated
in the same commit.

delay.force holds the once-lock across vtable.callFn (the thunk = arbitrary
eval — required for JVM once-semantics), so the COLLECTING main thread holds
the lock across a torture collect while a future worker blocks acquiring it.
The blocked worker is at no back-edge safepoint, so stopWorld waits for it to
park forever (hang 124). New safepoint.enterBlocked/exitBlocked (count a worker
parked for a blocking acquisition, re-check gc_requested on unblock) wrapped as
lockMutexAtSafepoint, applied at delay.force's lock. Other worker blocking
sites (agent cell mutex, future/promise conditions, STM locks) do NOT run eval
under their locks, so this is the only torture-deadlock site; auto-collect-ON
would need every blocking site wrapped (the user-owned #4a' audit).

Verified: delay_once_under_concurrency torture-clean (5x), full
phase14_future_promise_delay green under torture; full gate 252/0.
chaploud added a commit that referenced this pull request Jun 5, 2026
…IRef generalization

Smell-audited: 0: doc-only handover refresh. Resume contract now points at the
add-watch/remove-watch IRef generalization (the D-244 #4 multi-thread torture
first-commit is landed across 4554cde/97e8eb4c/6ed10df4).
chaploud added a commit that referenced this pull request Jun 5, 2026
…57/D-258

Smell-audited: 3: depth-3 new ADR. cljw gains its own HTTP server (the cljw-
original surface tree runtime/cljw/ activated ahead of Phase 14), on Zig 0.16
std.Io.net + std.http.Server — cw v0's server is disabled (pre-0.16 std.net), so
this is a fresh impl reusing cw v0's Ring API design. Naming = cljw.http.server /
.client (Clojure/Java/Python/Babashka split + Ring + run-server familiarity);
cljw.edge reserved for the deployment layer. D-257 = cycle-2 follow-ons
(keep-alive forced off, :headers/:body, threading/stop, GC rooting). D-258 =
records the agent_conj torture flake (load-induced, = dormant D-244 #4, not a
regression).
chaploud added a commit that referenced this pull request Jun 6, 2026
Convergence Campaign Stage 0.4 (probe-backed against a fresh HEAD binary).
Phase B (ADR-0090) is IMPLEMENTED at HEAD — verified: two 300ms-sleep
futures finish <500ms (real OS-thread parallelism); full STM/agent/
locking/atom-CAS all probe-green; git log shows the #4..#6 Phase B
campaign landed 2026-06-04→06.

- DISCHARGE D-009 (STM, fold→D-242), D-010 (locking, fold→D-245),
  D-012 (atom+watch), D-013 (STM barge, fold→D-242), D-211 (`'`-arith
  family, stale-LIE + DUP of D-260/ADR-0100).
- FLIP→now: D-224 (pmap recall fired — threading landed; now a perf item),
  D-046 (LazySeq.force mutex barrier met).
- De-stale D-242 anchor: "unimplemented core" → "concurrency hardening"
  (core landed; tracks D-244#4 torture + pmap/LazySeq/per-item residuals).

Net: −5 active (132→127 non-DISCHARGED). LIE lens: 0 (D-177 already
self-corrected + corpus-backed). DUP: 5 folded. D-105/106/243 (java.time/
net/crypto) confirmed legitimately open.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WasmModule.loadLinked may return partially initialized module on OOM

1 participant