Skip to content

Fix transaction race for file refs, fixes: #2600#2659

Merged
Sebastian Thiel (Byron) merged 5 commits into
GitoxideLabs:mainfrom
willstott101:fix-ref-transaction-toctou
Jun 23, 2026
Merged

Fix transaction race for file refs, fixes: #2600#2659
Sebastian Thiel (Byron) merged 5 commits into
GitoxideLabs:mainfrom
willstott101:fix-ref-transaction-toctou

Conversation

@willstott101

@willstott101 Will Stott (willstott101) commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

When two ref transactions update the same ref concurrently (e.g. two pushes doing main: AB), both could read the old value A, serialize through the lock, and both pass the MustExistAndMatch check
— silently losing one update. The fix reads the existing value after acquiring the lock, so the comparison is against a value that can't change underneath us. See: #2600

While rebasing this onto the current tree, one gix-ref test started failing. Moving the read under the lock meant a path-prefix collision (creating a/b while a is a ref file) was now hit by the
lock's create_dir_all first, which reports AlreadyExists and surfaced as a misleading "permanently locked" error instead of the previous NotADirectory.

My first pass restored the old behaviour with a small ref_contents probe before locking. That worked, but it added a filesystem read on every update just to produce a nicer error on a rare one — so I
went looking for a better spot and found that gix-fs's directory-creation loop already distinguishes a non-directory in the path (it does the is_dir() check) and was simply throwing that information
away by forwarding the raw AlreadyExists. Reporting it as NotADirectory there lets gix-ref surface lock-acquisition I/O errors as Error::Io (keeping Error::LockAcquire for genuine contention), which
fixes the error for every gix-lock caller and lets me drop the probe entirely, keeping the transaction code simpler.

@willstott101 Will Stott (willstott101) changed the title Fix transaction race for file refs (fixes: #2600) Fix transaction race for file refs, fixes: #2600 Jun 20, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0771253af2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

Comment thread gix-ref/src/store/file/transaction/prepare.rs
@Byron

Sebastian Thiel (Byron) commented Jun 23, 2026

Copy link
Copy Markdown
Member

Tasks

  • add benchmark utility in remembrance of the refs writing performance issue, get baseline. Locking performance is unchanged
  • refackiew

`create::Iter` already distinguishes a path component that exists as a
non-directory from one that is a directory (via `dir.is_dir()`), but it
forwarded the raw `AlreadyExists` error in the former case. That made a
path collision indistinguishable from genuine lock contention once it
reached `gix-lock` (which maps `AlreadyExists` to `PermanentlyLocked`).

Surface the non-directory case as `NotADirectory` so callers can tell the
two apart from the failing operation itself, without a separate probe.

In gix-ref this lets lock-acquisition I/O errors surface as `Error::Io`
(reserving `Error::LockAcquire` for actual contention), which restores the
historical `Io(NotADirectory)` for path-prefix collisions in the
`Change::Update` branch and removes the pre-lock probe added previously.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Read the existing ref value after acquiring the lock rather than before,
so the PreviousValue check compares against a value that cannot be
changed concurrently.  Without this, two transactions can both read the
old value, serialize through the lock, and both pass the CAS check —
silently losing one update.

The TOCTOU fix moved the existing-ref read under the lock, which for the
Change::Update branch is now after lock acquisition. A path-prefix
collision (creating `a/b` while `a` is a ref file) therefore surfaced via
the lock's create_dir_all as `AlreadyExists`, reported as the misleading
`LockAcquire(PermanentlyLocked)` rather than the historical
`Io(NotADirectory)`.

Thanks to the `gix-fs` fix, lock-acquisition I/O errors surface as `Error::Io`
(reserving `Error::LockAcquire` for actual contention), which restores the
historical `Io(NotADirectory)` for path-prefix collisions in the
`Change::Update` branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the prohibit_windows_device_names validation ahead of lock
acquisition in lock_ref_and_apply_change(). Previously the check
ran inside ref_contents(), which is called by read_existing_ref()
after the .lock file was already created. For a ref such as
refs/heads/CON the lock path CON.lock is itself a reserved Windows
device name, so prepare() would fail during lock acquisition (or
open the device) rather than returning the configured validation
error.

Extract the device-name check from ref_contents() into a new
check_windows_device_name() helper on the store and call it at the
top of lock_ref_and_apply_change(), before either the Delete or
Update branch tries to acquire a lock. ref_contents() still
delegates to the same helper for callers that read refs outside the
transaction path.
Add `gix free remote refs` to perform an upload-pack handshake, discover
the remote reference advertisement, and print the refs without negotiating
or receiving a pack.

The command can also write the advertised refs into a standalone ref store:

```sh
gix free remote refs --refs-directory out-refs <url>
```

Co-authored-by: Sebastian Thiel <sebastian.thiel@icloud.com>
@Byron

Copy link
Copy Markdown
Member

Thanks so much!

All I could do here is to clean up the commits a bit, and make just a minor refactor. Overall, I believe this is sound, and much better than it was before. After seeing all this complexity just to deal with the filesystem, and Windows, I really can't wait to just swap in RefTables and be done.

@Byron

Copy link
Copy Markdown
Member

There is one more note: the global lock is a performance optimization which works for gix, but won't work if Git would join the brawl. Probably that should change at some point, also given that soon RefTable will solve all problems 😅.

The global lock optimisation I will probably tackle in a follow-up.

@Byron Sebastian Thiel (Byron) merged commit 43dd683 into GitoxideLabs:main Jun 23, 2026
32 checks passed
@willstott101

Will Stott (willstott101) commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for your tidy up and merge!

Are reftables an all or nothing replacement in a repo then? Not like individual file refs and packed refs that can live side by side? I can research this myself, ignore me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants