Skip to content

Fix bg fill teardown crash in update_size_and_time_stats#13114

Open
bneradt wants to merge 1 commit intoapache:masterfrom
bneradt:fix_update_size_and_time_stats
Open

Fix bg fill teardown crash in update_size_and_time_stats#13114
bneradt wants to merge 1 commit intoapache:masterfrom
bneradt:fix_update_size_and_time_stats

Conversation

@bneradt
Copy link
Copy Markdown
Contributor

@bneradt bneradt commented Apr 21, 2026

This fixes a crash seen on docs.


Production crash logs show ink_assert(0) firing in
HttpTransact::update_size_and_time_stats from HttpSM::update_stats via
HttpSM::kill_this, triggered by an Http2Stream event re-entering the
state machine while background_fill was still in the STARTED state.
The background_fill state is normally driven to a terminal value
(COMPLETED or ABORTED) by tunnel_handler_server, but when the SM is
torn down before that handler runs the state stays STARTED and the
unreachable default branch in update_size_and_time_stats asserts. The
same path also leaks proxy.process.http.background_fill_current_count
because tunnel_handler_server is the only site that decrements the
gauge after tunnel_handler_ua incremented it.

This normalizes STARTED to ABORTED inside HttpSM::kill_this, before
the enable_http_stats gate that guards update_stats, and decrements
the gauge there. Doing the normalization in kill_this rather than in
update_stats keeps the gauge accounting balanced regardless of whether
http stats are enabled, and ensures the downstream stats helper sees a
terminal state. As a defensive backstop this also folds the STARTED
case into the ABORTED branch of update_size_and_time_stats so a future
caller that reaches this point mid-fill records the bytes against
background_fill_bytes_aborted instead of crashing the server.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a production crash in the HTTP transaction teardown path by ensuring background-fill accounting is left in a terminal state (and no longer triggers an ink_assert(0)) when the state machine is destroyed mid-fill.

Changes:

  • Treat BackgroundFill_t::STARTED as an ABORTED-equivalent in HttpTransact::update_size_and_time_stats() to avoid asserting and to record aborted bytes instead.
  • Normalize HttpSM::background_fill from STARTED to ABORTED during HttpSM::update_stats() teardown, and decrement proxy.process.http.background_fill_current_count there to prevent gauge leakage on early teardown.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/proxy/http/HttpTransact.cc Makes stats update resilient to STARTED by folding it into the aborted accounting path instead of asserting.
src/proxy/http/HttpSM.cc Normalizes leftover STARTED background-fill state during teardown and decrements the active-fill gauge to keep metrics balanced.

Comment thread src/proxy/http/HttpSM.cc Outdated
Production crash logs show ink_assert(0) firing in
HttpTransact::update_size_and_time_stats from HttpSM::update_stats via
HttpSM::kill_this, triggered by an Http2Stream event re-entering the
state machine while background_fill was still in the STARTED state.
The background_fill state is normally driven to a terminal value
(COMPLETED or ABORTED) by tunnel_handler_server, but when the SM is
torn down before that handler runs the state stays STARTED and the
unreachable default branch in update_size_and_time_stats asserts. The
same path also leaks proxy.process.http.background_fill_current_count
because tunnel_handler_server is the only site that decrements the
gauge after tunnel_handler_ua incremented it.

This normalizes STARTED to ABORTED inside HttpSM::kill_this, before
the enable_http_stats gate that guards update_stats, and decrements
the gauge there. Doing the normalization in kill_this rather than in
update_stats keeps the gauge accounting balanced regardless of whether
http stats are enabled, and ensures the downstream stats helper sees a
terminal state. As a defensive backstop this also folds the STARTED
case into the ABORTED branch of update_size_and_time_stats so a future
caller that reaches this point mid-fill records the bytes against
background_fill_bytes_aborted instead of crashing the server.
@bneradt bneradt force-pushed the fix_update_size_and_time_stats branch from e700ff2 to 82a1557 Compare April 21, 2026 23:43
@masaori335 masaori335 self-requested a review April 23, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants