Bump zwasm v1.9.0 → v1.9.1#4
Merged
Merged
Conversation
chaploud
added a commit
that referenced
this pull request
Apr 26, 2026
Audit findings against private/2026-04-27_strategic_review/ and v1 /
prior-redesign reference repos surfaced 11 missing pieces. This commit
adds them so Phase 1 onward does not silently drift.
Added:
- .claude/rules/{zone_deps,zig_tips,compat_tiers}.md (path-matched auto-load)
- .dev/decisions/{README.md, 0000-template.md} (ADR infrastructure)
- .dev/handover.md (session-to-session memo)
- .dev/known_issues.md (P0-P3 debt log)
- .dev/compat_tiers.yaml (per-namespace tier source of truth)
- .dev/concurrency_design.md (pre-Phase-15 deep dive)
- .dev/wasm_strategy.md (pre-Phase-19 deep dive; adopts hybrid)
- scripts/zone_check.sh (info / --strict / --gate; works on empty src)
- test/run_all.sh (single test entry point)
Updated:
- .dev/ROADMAP.md: new §11.6 Quality gate timeline (16 gates, active + planned),
added new files to §15.1, removed .editorconfig from §5, revision entry.
Removed:
- .editorconfig: project owner uses Emacs; format will be wired as a
pre-commit gate later (listed as gate #4 in ROADMAP §11.6).
chaploud
added a commit
that referenced
this pull request
May 30, 2026
Smell-audited: 1: structural-defect fix (representation divergence class #4) — (= (->Point 1 2) (->Point 1 2)) silently returned false. equal.zig had no .typed_instance arm so records fell to else=>false / bit-hash. Adds kind-gated .typed_instance arms to valueEqual + keyEqValue + valueHash: defrecord compares same descriptor + all declared fields (recursively); deftype keeps identity (no auto equals). A record is never = to a plain map (same-tag gate). The 3 arms stay mutually consistent so equal records share a hash bucket — usable as map keys ((get {rec :a} rec)). Found via the structural-defect probe sweep; 5 e2e cases.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…090 §2) Smell-audited: 1: Phase B implementation increment #2 per ADR-0090 §2. Added gc_mutex (std.Io.Mutex) to GcHeap; lock alloc/pin/unpin (gc_heap.zig) + the whole collect() cycle (mark_sweep.zig) via the io_default singleton (the allocator API takes no io arg). Makes allocation thread-safe under F-006 — the foundation the #3 ThreadGcContext root-publication handshake builds on for collection safety. Not reentrant (alloc never calls collect; collect never allocates). Uncontended + runtime-inert today (single-threaded; real threads land at #4 future/pmap), so no observable behaviour change. New concurrency test: 4 threads x 500 allocs through a threaded io serialize race-free (allocations.len == alloc_count == 2000). Full --serial-e2e gate green 247/0; the io_default-default-single-threaded gc tests still pass (uncontended lock). bench staged per source-bearing policy (also absorbs the session's dangling doc-commit gate samples). Stale 'lock deferred to Phase B' docstring updated to describe the landed lock.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…locks #3; re-analysis gated (D-244) Smell-audited: 2: Bad-Smell interrupt surfaced while designing increment #3. root_set.zig roots ns_vars/current_frame/macro_root_slot/permanent_roots but NOT the VM operand stack (vm.zig local Value array) nor tree_walk native-stack intermediates; safe today only because collect() runs at quiescent explicit points (no auto-collect). For Phase B real threads (#4), a mid-eval worker's operand/native-stack Values are un-rooted -> concurrent collect UAF; plus a pushFrame/popFrame read-during-write race during another thread's root walk. So ADR-0090 §2 Alt-2's 'no safepoint needed' is insufficient for mid-eval workers. Recorded as ADR-0090 Revision history + D-244 (the #3 gating design step): re-analyse with a DA-fork (safepoint Alt-1 vs publish-VM-operand-stack-root + forbid-tree_walk-during-collect) BEFORE the handshake code. The §1-2/§5-7 spine + increments #1/#2 are unaffected (the alloc lock is needed by either mechanism).
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…worker-only register, fold-not-11th-source)
Worked out the #3a implementation design (the delicate GC-root-walker rewire): registry lives IN root_set.zig (a separate gc_thread.zig would cycle via macro_root_slot); ThreadGcContext = {frame_slot, macro_slot} pointers to a worker's TLS; only worker threads register (main reads own TLS directly -> existing single-thread tests stay green, empty registry = current behaviour); FOLD the registry pass into the current_frame/macro_root_slot cursors rather than add an 11th RootSource (the 10-source count is asserted + ADR-0028 §5). #3b (safepoint + per-eval-frame operand-stack publication) couples to #4. Captured as the impl checklist so the most-correctness-critical code proceeds from a complete design.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…#3a-step2, ADR-0090 D-244) Smell-audited: 1: completes #3a per the D-244 Alt-B checklist. nextCurrentFrame + nextMacroRoot now walk a UNION of root sources: index 0 = this (collecting) thread's TLS (current_frame / macro_root_slot, read directly — unchanged behaviour), index k>=1 = registered worker k-1's published TLS (via frameSourceAt/macroSourceAt over the ThreadGcContext registry). Cursors gained src_idx/primed (replacing initialised/consumed); a FOLD into the existing cursors, NOT an 11th RootSource (preserves the ADR-0028 §5 10-source contract + its count test). Runtime-inert today (empty registry -> union == self -> existing single-thread tests pass unchanged). New test: a registered context pointing at a separate frame chain + macro slot is walked alongside self (proves the union reaches source >=1). Full --serial-e2e gate green 247/0. #3b (the alloc-boundary safepoint + per-eval-frame operand-stack publication, coupled to #4 real threads) is the remaining handshake sub-step.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…a registry (D-244 robustness) Smell-audited: 1: pre-#4 hardening for the ThreadGcContext registry that Phase-B workers will register into. 4 threads x 200 register+unregister cycles through a threaded io_default; asserts the io_default-locked fixed array is race-free (back to count 0, no stranded slot). Additive (new test only). zig build test green.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…or (Phase B #3b-step1 design) Smell-audited: 3: new ADR for the depth-2 structural choice (operand-stack root wiring). DA-fork rated Alt 2 (thread-roots union) finished-form-clean; the main loop's Alt-1 (clean 11th source) instinct was the Smallest-diff/Cycle-budget bias the DA names — overridden per F-002/F-011, adopted Alt 2. Reservation-as- bias on the "10 sources" count + the #3a "fold don't amend" precedent confronted (both memos, not contracts). DA output reflected verbatim. ADR-0028 §5 gains amendment 2 (rows 2+7 subsumed into thread_roots; enum 10->9). Implements ADR-0090 D-244 decision Alt B §3. #3b-step1 = publication infra (runtime-inert); #3b-step2 (safepoint) couples to #4.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…r (Phase B #3b-step1, ADR-0091)
Smell-audited: 1: implements the depth-3 ADR-0091 structural decision (the
DA-fork lives in that prior commit). No new smell: threadContextAt commonizes
#3a's triplicated per-thread addressing (F-011); the "10 sources" count treated
as a memo (10->9); publication is finished-form, not a provisional/no-op.
ADR-0091 Alt 2: subsume current_frame + macro_root_slot into a thread-major
`thread_roots` cursor that ALSO walks each thread's VM operand-stack EvalFrame
chain (stack[0..sp] + locals). root_set.zig owns EvalFrame + the threadlocal
eval_frame_head; vm.eval publishes its {stack,sp,locals} frame per call
(push/defer-pop). ThreadGcContext gains eval_frame_slot; the union walk covers
self (TLS) + every registered worker. Runtime-inert: collect() runs only at
quiescent points today; the #3b-step2 alloc-boundary safepoint makes it fire
mid-eval for Phase-B workers (couples to #4). Tests: self / union / stack[0..sp]
boundary (never the undefined region) / eval-frame parent chain.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…s (Phase B #3b-step2a, ADR-0090 Alt B) Smell-audited: 1: implements ADR-0090 Alt B's pause-the-mutators half (the mechanism's DA-fork is ADR-0090's). No new smell: a separate sp_mutex (NOT gc_mutex — a parked worker releases it while waiting) is the finished-form layering, not a workaround; the vm.eval:107 back-edge poll line is deferred to #4 where it fires + is e2e-testable (a hot-loop edit landed with its first use, not a cycle-budget defer). New concurrency/safepoint.zig: stopWorld(self_registered) arms gc_requested + blocks until every other registered worker parks; park() is the worker safe point (register parked, wake the collector, block on resume_cond until the flag clears); resumeWorld() clears + broadcasts. Two Io.Condition + a separate sp_mutex via io_default (pinned 0.16 Io.Condition has no timedWait → plain waitUncancelable; liveness bounded by the poll discipline). Runtime-inert: nothing arms gc_requested until #4's force-VM workers. Isolation tests with real std.Threads: all-parked rendezvous + a parked worker's published EvalFrame surviving a REAL mark_sweep.collect during STW (garbage swept, rooted retained). main.zig aggregator import (lazy-decl-analysis reach). D-244 updated.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…b-walk (Phase B #3b-step2b, ADR-0090 Alt B) Smell-audited: 1: implements ADR-0090 Alt B's self-guard within the decided envelope; extends the Alt 2 thread-major cursor with a 4th per-thread sub-walk (no new RootSource, enum stays 9) — validating ADR-0091's "extends, not rewrites". Step 0.6 corrected the survey's "self-only" justification: a PARKED worker mid-op_vector_literal also holds an un-published partial (it parks at its own alloc entry INSIDE conj), so gc_self_guard is walked for self AND every worker; a single slot suffices (Q1 ops assemble already-on-stack values, no nested eval). Refuses cw v0's suppressCollection hack — publishes a precise root (F-006 / F-011). root_set.zig: gc_self_guard threadlocal + ThreadGcContext.self_guard_slot + threadContextAt + thread_roots cursor self_guard phase (mirrors macro). Tests: self / union (worker self-guard) / null-inert. safepoint.zig test ctxs updated to 4 slots. ADR-0028 §5 amendment 2 + D-244 note the 4th sub-walk. Runtime-inert: nothing sets gc_self_guard until #4's Q1 fabrication-site wire-up.
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…t entry (Phase B #4a-alloc, ADR-0090 Alt B) Smell-audited: 1: implements the #4-survey-recommended alloc park point + the STW collect entry within ADR-0090 Alt B's decided envelope. The gc_heap<->safepoint import cycle is function-level (no circular TYPE dep), all Layer 0, no zone violation — Zig compiles it. auto-collect stays OFF (collect explicit/test- triggered); the safepoint being wired makes ANY collect safe — a documented #4a' staging (debt D-244), not a no-op (the survey resolved that the collect trigger belongs at the VM safe point which has rt+env, NOT inside the envs-less alloc). gc_heap.alloc gains a prologue `if (safepoint.gc_requested) park()` BEFORE gc_mutex: an allocating worker must register parked on sp_mutex first, else a worker blocked on gc_mutex is uncounted and stopWorld hangs. mark_sweep gains collectStopTheWorld(gc, ctx, self_registered) = stopWorld -> collect -> resumeWorld (caller must not hold gc_mutex; collect re-takes it, Io.Mutex is not reentrant). Real-thread test: N workers allocating through gc.alloc park at the prologue during a concurrent collectStopTheWorld + resume cleanly; + a single-threaded fenced-collect test. Swept fake test Cells use the finaliser-free .vector tag (.string's finaliser reads a data ptr the 16-byte fake Cell lacks).
chaploud
added a commit
that referenced
this pull request
Jun 4, 2026
…ase B #4b-poll, ADR-0090 Alt B) Smell-audited: 1: completes the worker-park wiring within ADR-0090 Alt B. The poll mirrors the tested alloc-prologue park (#4a); it is the non-allocating-loop half (a worker spinning in (loop [i 0] (recur (inc i))) never allocates, so the alloc park can't catch it — the back-edge poll does). VM-only by design (F-012: tree_walk never runs on a worker), so no dual-backend parity obligation. vm.eval gains `if (safepoint.gc_requested.load(.monotonic)) safepoint.park()` at the top of the `while(true)` dispatch loop. Relaxed load (liveness only; correctness fenced by park's acquire); one predicted-not-taken branch; inert until a #4 worker arms gc_requested. With the alloc-prologue park (#4a) + this poll, ANY worker reaches a safe point on a pending collect. Deterministic test: a worker re-evaluating a trivial alloc-free chunk in a tight loop is continuously in eval, so a stopWorld deterministically catches it parked at the poll + resumes it (no synthetic-loop timing race).
chaploud
added a commit
that referenced
this pull request
Jun 5, 2026
…; multi-thread future-worker torture = D-244 #4 (hang/crash), separate
chaploud
added a commit
that referenced
this pull request
Jun 5, 2026
…ng e2e hang = D-244 #4 caveat
chaploud
added a commit
that referenced
this pull request
Jun 5, 2026
…-244 #4 multi-thread future-worker torture
chaploud
added a commit
that referenced
this pull request
Jun 5, 2026
…ure-worker hang/crash) Smell-audited: 2: torture is a test harness; a worker-initiated STW collect is the genuinely-dormant D-244 #4 (self-deadlock + misses the unregistered main's roots), recorded as debt — not papered over. Scoping torture to the main thread tests what is testable now (main parks workers + walks the full root set) and leaves the worker-initiated multi-thread collect as the user-owned highest-risk path. The vm.zig torture poll hardcoded collectStopTheWorld(.., false) so a future/agent WORKER's back-edge poll triggered a worker-initiated STW: stopWorld waited for the calling worker to park (self-deadlock, hang 124), and the worker-thread collect walked only its own TLS + registered workers, missing the unregistered main thread's roots (crash 134). New threadlocal root_set.is_registered_worker (set in registerThread, cleared in unregisterThread) gates the torture poll to the main thread, where the collect parks the registered workers and walks the complete root set. Verified torture-green: (future (reduce + (range 1 100)))=4950, (mapv #(future (* %1 %1)) (range 1 5))=[1 4 9 16], (pmap inc (range 1 8)).
chaploud
added a commit
that referenced
this pull request
Jun 5, 2026
…ny-action drainer hang) Smell-audited: 2: a real TOCTOU root cause, not a torture-only workaround — discharges the named D-244 #4b-future-ii sub-item "stopWorld currentTarget re-read (a worker exiting mid-STW must not hang it)" and resolves the D-253 agent residual. SSOT (.dev/gc_rooting.md E4) updated in the same commit so the rooting-surface doc stays honest with the mechanism change. An agent send/await under main-thread torture hung (exit 124): the drainer runs a tiny action (inc) and UNREGISTERS before it ever reaches a safepoint poll to park, but stopWorld snapshotted `target = registeredThreadCount()` ONCE and waited for `parked_count` to reach it — a count the now-departed worker can never satisfy. stopWorld now recomputes the target on every wake from a lock-free `registered_count` (read under sp_mutex, so the leaving worker's `noteWorkerLeft` -> all_parked broadcast can neither be lost nor invert the registry_mutex/sp_mutex order). registerThread/unregisterThread maintain the atomic count; the leaving worker wakes the collector after decrementing it. Verified: agent send/await/drain (=1/20/[1 2]/5) torture-clean, 15x stress green, future/pmap unchanged; full gate 252/0.
chaploud
added a commit
that referenced
this pull request
Jun 5, 2026
…rent-deref torture deadlock) Smell-audited: 2: a real worker-blocked-on-lock root cause, not a torture workaround — the proper safepoint-transition mechanism, production-inert (only a pending collect exercises it), bounded to the one eval-under-lock site. SSOT (gc_rooting.md E6) + debt (D-250 tier-2 multi-thread-clean, D-244 #4) updated in the same commit. delay.force holds the once-lock across vtable.callFn (the thunk = arbitrary eval — required for JVM once-semantics), so the COLLECTING main thread holds the lock across a torture collect while a future worker blocks acquiring it. The blocked worker is at no back-edge safepoint, so stopWorld waits for it to park forever (hang 124). New safepoint.enterBlocked/exitBlocked (count a worker parked for a blocking acquisition, re-check gc_requested on unblock) wrapped as lockMutexAtSafepoint, applied at delay.force's lock. Other worker blocking sites (agent cell mutex, future/promise conditions, STM locks) do NOT run eval under their locks, so this is the only torture-deadlock site; auto-collect-ON would need every blocking site wrapped (the user-owned #4a' audit). Verified: delay_once_under_concurrency torture-clean (5x), full phase14_future_promise_delay green under torture; full gate 252/0.
chaploud
added a commit
that referenced
this pull request
Jun 5, 2026
…57/D-258 Smell-audited: 3: depth-3 new ADR. cljw gains its own HTTP server (the cljw- original surface tree runtime/cljw/ activated ahead of Phase 14), on Zig 0.16 std.Io.net + std.http.Server — cw v0's server is disabled (pre-0.16 std.net), so this is a fresh impl reusing cw v0's Ring API design. Naming = cljw.http.server / .client (Clojure/Java/Python/Babashka split + Ring + run-server familiarity); cljw.edge reserved for the deployment layer. D-257 = cycle-2 follow-ons (keep-alive forced off, :headers/:body, threading/stop, GC rooting). D-258 = records the agent_conj torture flake (load-induced, = dormant D-244 #4, not a regression).
chaploud
added a commit
that referenced
this pull request
Jun 6, 2026
Convergence Campaign Stage 0.4 (probe-backed against a fresh HEAD binary). Phase B (ADR-0090) is IMPLEMENTED at HEAD — verified: two 300ms-sleep futures finish <500ms (real OS-thread parallelism); full STM/agent/ locking/atom-CAS all probe-green; git log shows the #4..#6 Phase B campaign landed 2026-06-04→06. - DISCHARGE D-009 (STM, fold→D-242), D-010 (locking, fold→D-245), D-012 (atom+watch), D-013 (STM barge, fold→D-242), D-211 (`'`-arith family, stale-LIE + DUP of D-260/ADR-0100). - FLIP→now: D-224 (pmap recall fired — threading landed; now a perf item), D-046 (LazySeq.force mutex barrier met). - De-stale D-242 anchor: "unimplemented core" → "concurrency hardening" (core landed; tracks D-244#4 torture + pmap/LazySeq/per-item residuals). Net: −5 active (132→127 non-DISCHARGED). LIE lens: 0 (D-177 already self-corrected + corpus-backed). DUP: 5 folded. D-105/106/243 (java.time/ net/crypto) confirmed legitimately open.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WasmModule.vmnullable so thatloadLinkedcan safely fail after Phase 1 without leaving UB behind (fix: avoid crash on OOM by safe-guarding uninitialized VM pointer zwasm#40 by @jtakakura, closes WasmModule.loadLinked may return partially initialized module on OOM zwasm#39). Addserror.ModuleNotFullyLoadedfrominvoke/invokeInterpreterOnlyin that path, and fixes a segfault whereWasmModule.cancel()would dereference a null VM. No breaking changes for CW — the happy-path API is unchanged.Test plan
zig build— clean build greenbash test/run_all.sh --quick— 4/4 pass (zig build test, cljw test 83 namespaces, e2e wasm, deps.edn e2e)