Skip to content

gh-148820: Fix _PyRawMutex use-after-free on spurious semaphore wakeup#148847

Closed
colesbury wants to merge 1 commit intopython:mainfrom
colesbury:gh-148820-raw-mutex-park-intr
Closed

gh-148820: Fix _PyRawMutex use-after-free on spurious semaphore wakeup#148847
colesbury wants to merge 1 commit intopython:mainfrom
colesbury:gh-148820-raw-mutex-park-intr

Conversation

@colesbury
Copy link
Copy Markdown
Contributor

_PyRawMutex_UnlockSlow CAS-removes the waiter from the list and then calls _PySemaphore_Wakeup, with no handshake (unlike parking_lot's wait_entry.is_unparking protocol). If the waiter's _PySemaphore_Wait returns for any reason other than the matching Wakeup — e.g. Py_PARK_INTR from sigint_event on the main thread on Windows, or EINTR on POSIX — the waiter can exit the wait, re-acquire the mutex (already CAS-unlocked by the unlocker), break out of the loop, and call _PySemaphore_Destroy on the stack-allocated semaphore before the unlocker's _PySemaphore_Wakeup runs. That Wakeup then hits a closed handle, producing Fatal Python error: _PySemaphore_Wakeup: parking_lot: ReleaseSemaphore failed (Win32 ERROR_INVALID_HANDLE).

Observed on free-threaded 3.14 on Windows with coverage run + trio + signal.raise_signal(SIGINT) from a non-main thread. Parking-lot's _PyParkingLot_Park is not affected because its bucket-locked is_unparking handshake already ensures the parker waits for the unparker's Wakeup before destroying the semaphore.

Fix

Loop in _PyRawMutex_LockSlow until _PySemaphore_Wait returns Py_PARK_OK. Py_PARK_OK is only returned when a matching post was observed on all backends (WAIT_OBJECT_0 / sem_wait success / sema->counter > 0), so looping until OK guarantees the unlocker's pending Wakeup has fired before we destroy the semaphore. Deferring SIGINT until the unlock completes is acceptable in practice: _PyRawMutex protects only short critical sections (principally parking-lot bucket locks), so ctrl-C is delivered near-immediately.

Also include GetLastError() and the HANDLE value in the Windows fatal messages in _PySemaphore_Init, _PySemaphore_Wait, and _PySemaphore_Wakeup — these were key in identifying the bug as a use-after-free (error 6 = ERROR_INVALID_HANDLE) rather than a double-release.

The _PySemaphore_Wakeup API signature is intentionally left unchanged to preserve the ability to backport this fix.

Based on analysis and patch by @colesbury in https://gist.github.com/colesbury/3e4c6180e3eb4b3b9fd07b26f3196e12 / https://gist.github.com/colesbury/f9f0c5cf1a00f2c946e80795fcc245d7.

Verification

Stress harness (16 workers × 100 iterations × coverage run of the trio SIGINT-from-thread reproducer) showed ~0.19% crash rate on the unpatched baseline and 0% with the loop.

@colesbury colesbury marked this pull request as draft April 21, 2026 18:40
@colesbury colesbury force-pushed the gh-148820-raw-mutex-park-intr branch 2 times, most recently from d659c6b to da9e042 Compare April 21, 2026 18:41
… wakeup

_PyRawMutex_UnlockSlow CAS-removes the waiter from the list and then
calls _PySemaphore_Wakeup, with no handshake. If _PySemaphore_Wait
returns Py_PARK_INTR, the waiter can destroy its stack-allocated
semaphore before the unlocker's Wakeup runs, causing a fatal error from
ReleaseSemaphore / sem_post.

Loop in _PyRawMutex_LockSlow until _PySemaphore_Wait returns Py_PARK_OK,
which is only signalled when a matching Wakeup has been observed.

Also include GetLastError() and the handle in the Windows fatal messages
in _PySemaphore_Init, _PySemaphore_Wait, and _PySemaphore_Wakeup to make
similar races easier to diagnose in the future.
@colesbury colesbury force-pushed the gh-148820-raw-mutex-park-intr branch from da9e042 to 52bf353 Compare April 21, 2026 18:42
@colesbury colesbury closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant