test_runner: add flaky option to retry on failure#63661
Conversation
|
Review requested:
|
Han5991
left a comment
There was a problem hiding this comment.
A few self-review notes on the trickier parts, anchored to the exact lines.
| // (at first `.assert` access), preserving the historical counting behavior | ||
| // exactly - opting out of `flaky` changes nothing. | ||
| const capturedPlan = test.plan; | ||
| const currentPlan = () => (test.flakyRetries > 0 ? test.plan : capturedPlan); |
There was a problem hiding this comment.
This is the one change that touches the non-flaky path: the assertion plan is now read at use-time, but only for flaky tests (test.flakyRetries > 0); non-flaky tests keep the registration-time capture (capturedPlan). Please confirm non-flaky plan counting is unchanged — the regression guard is the assert-before-plan case in test/parallel/test-runner-flaky.js.
| // final attempt creates no subtest - leaving outputSubtestCount > 0 with | ||
| // an empty `subtests`, which crashed report() (see the guard there). | ||
| this.subtests = []; | ||
| this.outputSubtestCount = 0; |
There was a problem hiding this comment.
Subtest crash fix (1/2): on retry we clear subtests and reset outputSubtestCount. Without the reset the count stayed > 0 while subtests was emptied, which crashed the reporter (see the guard in 2/2).
| // `subtests` can be empty here when a flaky retry cleared it even though | ||
| // outputSubtestCount was non-zero; fall back to this test's own nesting | ||
| // rather than dereferencing subtests[0] (which would crash the run). | ||
| const subtestNesting = this.subtests.length ? |
There was a problem hiding this comment.
Subtest crash fix (2/2) + known limitation: this guards the empty-subtests dereference that aborted the run. Known limitation: intermediate-attempt subtests are emitted to the stream before the retry decision, so they cannot be recalled — the output can still contain leaked subtest lines and a slightly inflated # tests. Full fidelity would need buffering each attempt's events and replaying only the surviving one. Is the crash-fix-only version OK to land, with fidelity as a follow-up?
| cancelled: kCanceledTests.has(item.data.details?.error?.failureType), | ||
| // A retried timeout that exhausted (retryCount > 0) is a failure, not a | ||
| // cancellation only an un-retried timeout/abort stays cancelled. | ||
| cancelled: kCanceledTests.has(item.data.details?.error?.failureType) && |
There was a problem hiding this comment.
run() process-isolation parent re-counts the child's serialized events, so two things are handled here: an exhausted flaky timeout (retryCount > 0) is promoted out of cancelled into a failure, and any flaky-marked test (retryCount present, even 0) counts toward flaky. This cross-process re-derivation is easy to get subtly wrong — please sanity-check.
| publishError(err); | ||
| } | ||
| if (isTestFailureError(err)) { | ||
| if (err.failureType === kTestTimeoutFailure && this.flakyRetries === 0) { |
There was a problem hiding this comment.
Timeout handling: a non-flaky timeout stays cancelled (#cancel), but a flaky one goes through fail() so it can retry and, when the budget is exhausted, counts as a failure rather than a cancellation. The flakyRetries === 0 guard is what preserves the existing non-flaky behavior.
| this.expectedAssertions = plan; | ||
| this.cancelled = false; | ||
| if (flaky === undefined || flaky === null) { | ||
| this.flakyRetries = this.parent?.flakyRetries ?? 0; |
There was a problem hiding this comment.
Semantics per the proposal: nearest-wins inheritance (this parent fallback), flaky: true → 20 retries (kDefaultFlakyRetries, L106; set just below), an explicit positive integer is used as-is, and 0/negative/non-integer values throw via validateInteger. Confirming these defaults match the proposal's intent would help.
8ec070e to
b9fa345
Compare
b9fa345 to
454ccc0
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #63661 +/- ##
==========================================
- Coverage 91.95% 90.35% -1.61%
==========================================
Files 379 732 +353
Lines 166454 236910 +70456
Branches 25427 44631 +19204
==========================================
+ Hits 153058 214050 +60992
- Misses 13104 14574 +1470
- Partials 292 8286 +7994
🚀 New features to boost your workflow:
|
3ff2a6b to
a1989d8
Compare
JakobJingleheimer
left a comment
There was a problem hiding this comment.
Thanks for this!
A small thing (I think it wouldn't be much work to adjust for this): should it be tries instead of retries? Usually, a test is run N times inclusive of the initial. I think that would also simplify the implementation a bit too.
| #countCompleted() { | ||
| let ancestor = this.parent; | ||
| while (ancestor !== null) { | ||
| if (ancestor.isFlaky && ancestor.endTime === null) { | ||
| ancestor.pendingCountedSubtests ??= []; | ||
| ArrayPrototypePush(ancestor.pendingCountedSubtests, this); | ||
| return; | ||
| } | ||
| ancestor = ancestor.parent; | ||
| } |
There was a problem hiding this comment.
This doesn't do what it says, and I don't love that it creates side-effects (specifically via ArrayPrototypePush(ancestor.pendingCountedSubtests, this)).
I think a do...while is better suited here because we know the first time it's called that ancestor won't be empty.
| #countCompleted() { | |
| let ancestor = this.parent; | |
| while (ancestor !== null) { | |
| if (ancestor.isFlaky && ancestor.endTime === null) { | |
| ancestor.pendingCountedSubtests ??= []; | |
| ArrayPrototypePush(ancestor.pendingCountedSubtests, this); | |
| return; | |
| } | |
| ancestor = ancestor.parent; | |
| } | |
| /** | |
| * Climb up the tree, checking for pending tests. | |
| */ | |
| #collectPendingToRoot() { | |
| let ancestor = this.parent; | |
| do { | |
| if (ancestor.isFlaky && ancestor.endTime === null) { | |
| ancestor.pendingCountedSubtests ??= []; | |
| ArrayPrototypePush(ancestor.pendingCountedSubtests, this); | |
| return; | |
| } | |
| ancestor = ancestor.parent; | |
| } | |
| while (ancestor !== null); |
Should the return be a break? Otherwise countCompletedTest never gets called, which it seems it should.
There was a problem hiding this comment.
I'd lean toward keeping the current shape:
return(notbreak) is intentional: a subtest parked on a still-running flaky ancestor is re-counted when that ancestor reports (report()replays the parked list viapending[i].#countCompleted()).breakwould count it now and again on replay → double-counted# tests.while(notdo...while): the root test reaches#countCompleted()withparent === null, wherewhileskips the body and falls through tocountCompletedTest(this).do...whilewould run the body once withancestor === nulland throw onancestor.isFlaky.- The
ArrayPrototypePushis the mechanism (parking the count until the ancestor settles), not an incidental side effect.
If the comment reads as "doesn't do what it says," I'm happy to reword it to make the park-then-replay flow explicit — let me know which part was unclear.
JakobJingleheimer
left a comment
There was a problem hiding this comment.
Thanks for this!
A small thing (I think it wouldn't be much work to adjust for this): should it be tries instead of retries? Usually, a test is run N times inclusive of the initial. I think that would also simplify the implementation a bit too.
I went with retries-semantics (the number is additional attempts after the first run, so
Inclusive/total counts do exist, but they pair with names like On the implementation side, inclusive only removes the Happy to revisit if you feel strongly about inclusive counts. |
Add a `flaky` option that re-runs a failing test until it passes, intended for tests with unavoidable nondeterminism. Setting `flaky: true` retries up to 20 times; `flaky: <positive integer>` sets an explicit retry budget. The option is accepted on tests and suites and via the it.flaky/test.flaky/describe.flaky/suite.flaky shorthands; a test-case value overrides an inherited suite value (nearest wins), and `flaky: false` opts a test out. Only the final attempt is observable: intermediate failures emit no test:fail, no per-test diagnostics, and nothing on the node.test error channel. Each result carries a new `retryCount` field on the test:pass, test:fail, and test:complete events (the number of retries performed, `undefined` for non-flaky tests), reporters print a `# FLAKY` directive, and the run summary gains a `flaky` counter. beforeEach/afterEach re-run on every attempt while before/after run once, so per-attempt state is reset and retries do not leak state. An externally aborted test and an expectFailure are not retried. A flaky test whose timeout is exhausted is reported as a failure rather than cancelled. Co-authored-by: vespa7 <98526766+vespa7@users.noreply.github.com> Signed-off-by: sangwook <rewq5991@gmail.com>
Add failing regression tests for nine defects found while reviewing the flaky feature, each asserting the corrected behavior: - a flaky + expectFailure test that unexpectedly passes must not retry - a flaky parent must retry when only a subtest fails - node.test tracing start/end must stay balanced across retries - flaky must not propagate from a test to its own subtests - the MockTracker must be reset between retries - a test-level after hook must run after the final attempt - a retried attempt must abort the previous attempt's signal - an in-flight subtest from a discarded attempt must not corrupt the run - a flaky + expectFailure error must still reach the node.test channel Signed-off-by: sangwook <rewq5991@gmail.com>
Fix eight defects in the flaky retry loop, each covered by a regression test added in the previous commit: - do not retry a flaky expectFailure test that unexpectedly passes - retry a flaky parent when a failure surfaces only through a subtest - publish the tracing channel end per attempt so spans stay balanced - inherit flaky only from a suite parent, not a test to its subtests - reset the MockTracker between retries - run a test-level after hook only after the final attempt - abort the previous attempt's signal so its work is torn down - publish a flaky expectFailure throw's error to the node.test channel The remaining intermediate-attempt subtest output inflation across retries is the documented known limitation and is unchanged. Signed-off-by: sangwook <rewq5991@gmail.com>
Tidy the flaky retry support added earlier on this branch without changing its behavior: - Extract the per-attempt reset and the retry-eligibility check into #resetForRetry() and #retriesRemaining(), and recreate afterEach through a single factory, so run()'s retry loop reads as reset, attempt, then decide. - Derive an isFlaky flag and use it consistently in the constructor, the completed-test counter, and the run() process-isolation parent reconstruction instead of overloading flakyRetries. - Share the reporter "# FLAKY" directive through flakyDirective() and formatRetryCount() helpers used by the tap, spec, and junit reporters. Signed-off-by: sangwook <rewq5991@gmail.com>
Fix issues found while reviewing the flaky retry implementation: - park subtest completion counts while a flaky ancestor can still retry, so a discarded attempt's failing subtest no longer flips harness.success and fails the run after a successful retry - restart subtest reportOrder bookkeeping on retry so subtests created by a retry attempt are finalized and reported - keep the non-flaky cancellation semantics for a timed-out flaky + expectFailure test instead of counting it as a pass - keep the test-level after hook for the final attempt when a failed subtest triggers the retry, and run deferred final-attempt duties when an anticipated retry is called off - truncate per-attempt hooks registered by the test body so they do not accumulate across retries - scope the run() parent's cancelled-to-failed upgrade to exhausted timeouts only - filter the child process flaky summary diagnostic in the parent - keep an explicit numeric budget passed alongside the flaky shorthand - rebind tracing publishers from a stable base across attempts - emit the junit flaky property as a raw retry count Signed-off-by: sangwook <rewq5991@gmail.com>
Reflect reviewer feedback on the flaky retry change: add JSDoc block comments for IntelliSense, wrap long conditionals and calls across multiple lines, clarify comment wording, and simplify the report directive merge since spreading undefined is a no-op. Alphabetize the diagnostics filter list and de-mangle the TAP and JUnit reporter calls. No behavior change. Signed-off-by: sangwook <rewq5991@gmail.com>
51f6955 to
f96a879
Compare
Signed-off-by: sangwook <rewq5991@gmail.com>
This feature is based on the
flakyproposal in the nodejs/test-runner repo, which I used as the reference for the API surface and semantics. A test can be markedflakyso the runner re-runs it until it passes — useful for tests with unavoidable nondeterminism.Supersedes #61746 (original work by @vespa7, credited as co-author); this is a fresh implementation on current
main.Closes #61746
What it adds
flaky: true→ retry up to 20 times;flaky: <positive integer>→ explicit budget;flaky: false→ off. Invalid values throw at registration.it.flaky/test.flaky/describe.flaky/suite.flakyshorthands. Nearest value wins (a test-case overrides its suite).test:fail, no per-test diagnostics, and nothing on thenode.testerror channel.retryCountfield ontest:pass/test:fail/test:complete; reporters print a# FLAKY …directive; the run summary gains aflakycounter.beforeEach/afterEachre-run per attempt (state reset between retries);before/afterrun once. External abort andexpectFailureare not retried.Points I'd appreciate eyes on
I've left inline review comments on the specific lines that would most benefit from review — the flaky-scoped assertion-plan read, the subtest crash fix and its known limitation, and the
run()process-isolation parent re-counting.Testing
New
test/parallel/test-runner-flaky.jswith deterministic, counter-based fixtures, plus a reporter snapshot test. Fulltest_runnersuite passes; lint clean.