fix: seed fast pseudo-random numbers to differ across processes#2663
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: faa19e3139
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
Per review on GitoxideLabs#2663: - `fastrand::Rng` is a single `u64`, so a `Cell` -- move it out behind a cheap placeholder, advance it, and move it back -- is enough and avoids `RefCell`'s runtime borrow tracking, mirroring `fastrand`'s own thread-local generator. - We no longer use `fastrand`'s global generator, so its `std` default feature isn't needed: set `default-features = false` and bump to 2.4.1. The `wasm32-unknown-unknown` fallback therefore seeds with a fixed value instead of `Rng::new()` (which needs the std-gated global); that target is single-process, so a fixed seed is fine.
Per review on GitoxideLabs#2663: - `fastrand::Rng` is a single `u64`, so a `Cell` -- move it out behind a cheap placeholder, advance it, and move it back -- is enough and avoids `RefCell`'s runtime borrow tracking, mirroring `fastrand`'s own thread-local generator. - We no longer use `fastrand`'s global generator, so its `std` default feature isn't needed: set `default-features = false` and bump to 2.4.1. The `wasm32-unknown-unknown` fallback therefore seeds with a fixed value instead of `Rng::new()` (which needs the std-gated global); that target is single-process, so a fixed seed is fine.
480b5d6 to
40ccaa0
Compare
`gix-utils` and `gix-fs` drew fast pseudo-random numbers from `fastrand`'s global generators, whose per-thread seed is derived only from values that can coincide between separate processes (notably small thread IDs). Two concurrently running processes could therefore produce identical sequences, which weakens backoff jitter (meant to avoid a thundering herd of retries) and can make the filesystem capability probes collide on their temporary file names -- e.g. spuriously reporting that symlinks are unsupported (GitoxideLabs#1789). Add a `gix_utils::rng` module that draws from a per-thread `fastrand::Rng` seeded from a high-entropy OS source via `getrandom`, and route the affected call sites (backoff jitter and the `gix-fs` capability probes) through it. `getrandom` is a target-specific dependency that is excluded on `wasm32-unknown-unknown`, which has no entropy backend by default and runs a single process; there a best-effort seed is used that deliberately avoids `Instant::now()` (which panics on that target). For GitoxideLabs#1816
- `std::process::id()` traps at runtime on `wasm32-unknown-unknown`
("no pids on this platform"), and the fallback seed -- the only seed
path on that target -- called it, so the first `rng::usize` would
abort in browser/unknown wasm builds. On that single-process target,
where cross-process distinctness is moot anyway, defer to `fastrand`'s
own wasm-safe default seeding instead. The high-entropy `getrandom`
seed and the `process::id()`-based fallback now compile only off that
target.
- `fastrand::Rng` is a single `u64`, so a `Cell` -- move it out behind a
cheap placeholder, advance it, and move it back -- is enough and avoids
`RefCell`'s runtime borrow tracking, mirroring `fastrand`'s own
thread-local generator.
- We no longer use `fastrand`'s global generator, so its `std` default
feature isn't needed: set `default-features = false` and bump to 2.4.1.
The `wasm32-unknown-unknown` fallback therefore seeds with a fixed value
instead of `Rng::new()` (which needs the std-gated global); that target
is single-process, so a fixed seed is fine.
Co-authored-by: Sebastian Thiel <sebastian.thiel@icloud.com>
This should avoid flaky capability tests of multiple process acting on the same repository. Co-authored-by: GPT 5.5 <codex@openai.com>
a48b921 to
96ead72
Compare
|
Thanks for getting the fix in, and let's hope that truly works. It really depends on Also many thanks to Eliah Kagan (@EliahKagan) for chiming in and acknowledging Codex's auto-fix. I gave it another pass and assume it's what you had in mind (the review didn't make material changes). |
|
I think one windows job already failed because of exactly what this PR is supposed to improve: It's this job, but I am having trouble getting back to the first attempt, maybe because it kind of timed out and was extra-strange. If it passes next time, I think we should still merge this as it nicely unifies our random number generation, independently of what it tries to achieve beyond that. |
386f31a
into
GitoxideLabs:main
I'm not aware of how that
I think you're referring to the inability to expand the "Test (nextest)" step in the "test-fixtures-windows (windows-latest)" job in attempt 1 there. It shows as though the job is still running, even though its status shows as failed. I believe this sometimes happens when a runner stops responding and the job is eventually stopped automatically for that reason. I've seen this much more often over the course of the past month. I think it's usually a flake in GitHub's infrastructure, rather than relating conceptually to whatever test is executing. However, I'm guessing that the test failure you showed was something that happened in that step, prior to the connection to the runner being lost. My guess is that the problems are independent: we may have a flaky test that failed while |
What
gix-utilsandgix-fsdraw fast pseudo-random numbers fromfastrand's global generators, whose per-thread seed is derived only from values that can coincide between separate processes — notably small thread IDs. Two concurrently running processes can therefore draw identical sequences, which:gix-lock), andgix-fscapability probes collide on their temporary file names — the spurious "symlinks unsupported" failure seen on CI (Is `overwriting_files_and_lone_directories_works` flaky? #1789).Fixes #1816.
How
Adds a small
gix_utils::rngmodule: a per-threadfastrand::Rngseeded once from a high-entropy OS source viagetrandom::u64(), so sequences differ across processes. The two production call sites are routed through it — the backoff jitter ingix-utils, and the three capability probes ingix-fs(whose now-unused directfastranddependency is dropped).wasm handling
getrandomis added as a target-specific dependency, excluded onwasm32-unknown-unknown, which has no entropy backend by default (and is single-process, so the cross-process concern doesn't apply there). On that target the seed falls back to a best-effort value that deliberately avoidsInstant::now(), since that panics onwasm32-unknown-unknown(#7319). Onwasm32-wasi*targetsgetrandomis included and uses the wasi backend.Note on the public API
This adds a new public
gix_utils::rngmodule (sogix-fscan use it across the crate boundary) — flagging in case it affects the version bump.Out of scope
The
gix-testtoolsport-shuffle (case 3 in the issue) is left as-is — the issue notes it isn't really a problem, only a minor test slowdown.Verification
gix-utils+gix-fsbuild and tests pass;cargo clippy -- -D warningsandcargo fmt --checkare clean.wasm32-unknown-unknown(getrandom excluded, fallback seed used) andwasm32-wasip1(getrandom included). Note: these were verified as builds, not at runtime (no local wasm host).