Skip to content

test_runner: add flaky option to retry on failure#63661

Open
Han5991 wants to merge 7 commits into
nodejs:mainfrom
Han5991:feat/test-runner-flaky
Open

test_runner: add flaky option to retry on failure#63661
Han5991 wants to merge 7 commits into
nodejs:mainfrom
Han5991:feat/test-runner-flaky

Conversation

@Han5991

@Han5991 Han5991 commented May 30, 2026

Copy link
Copy Markdown
Contributor

This feature is based on the flaky proposal in the nodejs/test-runner repo, which I used as the reference for the API surface and semantics. A test can be marked flaky so the runner re-runs it until it passes — useful for tests with unavoidable nondeterminism.

Supersedes #61746 (original work by @vespa7, credited as co-author); this is a fresh implementation on current main.

Closes #61746

What it adds

  • flaky: true → retry up to 20 times; flaky: <positive integer> → explicit budget; flaky: false → off. Invalid values throw at registration.
  • Works on tests and suites, plus the it.flaky / test.flaky / describe.flaky / suite.flaky shorthands. Nearest value wins (a test-case overrides its suite).
  • Only the final attempt is observable: intermediate failures emit no test:fail, no per-test diagnostics, and nothing on the node.test error channel.
  • New retryCount field on test:pass / test:fail / test:complete; reporters print a # FLAKY … directive; the run summary gains a flaky counter.
  • beforeEach/afterEach re-run per attempt (state reset between retries); before/after run once. External abort and expectFailure are not retried.

Points I'd appreciate eyes on

I've left inline review comments on the specific lines that would most benefit from review — the flaky-scoped assertion-plan read, the subtest crash fix and its known limitation, and the run() process-isolation parent re-counting.

Testing

New test/parallel/test-runner-flaky.js with deterministic, counter-based fixtures, plus a reporter snapshot test. Full test_runner suite passes; lint clean.

@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/test_runner

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 30, 2026

@Han5991 Han5991 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few self-review notes on the trickier parts, anchored to the exact lines.

Comment thread lib/internal/test_runner/test.js Outdated
// (at first `.assert` access), preserving the historical counting behavior
// exactly - opting out of `flaky` changes nothing.
const capturedPlan = test.plan;
const currentPlan = () => (test.flakyRetries > 0 ? test.plan : capturedPlan);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one change that touches the non-flaky path: the assertion plan is now read at use-time, but only for flaky tests (test.flakyRetries > 0); non-flaky tests keep the registration-time capture (capturedPlan). Please confirm non-flaky plan counting is unchanged — the regression guard is the assert-before-plan case in test/parallel/test-runner-flaky.js.

Comment thread lib/internal/test_runner/test.js Outdated
// final attempt creates no subtest - leaving outputSubtestCount > 0 with
// an empty `subtests`, which crashed report() (see the guard there).
this.subtests = [];
this.outputSubtestCount = 0;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtest crash fix (1/2): on retry we clear subtests and reset outputSubtestCount. Without the reset the count stayed > 0 while subtests was emptied, which crashed the reporter (see the guard in 2/2).

// `subtests` can be empty here when a flaky retry cleared it even though
// outputSubtestCount was non-zero; fall back to this test's own nesting
// rather than dereferencing subtests[0] (which would crash the run).
const subtestNesting = this.subtests.length ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtest crash fix (2/2) + known limitation: this guards the empty-subtests dereference that aborted the run. Known limitation: intermediate-attempt subtests are emitted to the stream before the retry decision, so they cannot be recalled — the output can still contain leaked subtest lines and a slightly inflated # tests. Full fidelity would need buffering each attempt's events and replaying only the surviving one. Is the crash-fix-only version OK to land, with fidelity as a follow-up?

cancelled: kCanceledTests.has(item.data.details?.error?.failureType),
// A retried timeout that exhausted (retryCount > 0) is a failure, not a
// cancellation only an un-retried timeout/abort stays cancelled.
cancelled: kCanceledTests.has(item.data.details?.error?.failureType) &&

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run() process-isolation parent re-counts the child's serialized events, so two things are handled here: an exhausted flaky timeout (retryCount > 0) is promoted out of cancelled into a failure, and any flaky-marked test (retryCount present, even 0) counts toward flaky. This cross-process re-derivation is easy to get subtly wrong — please sanity-check.

Comment thread lib/internal/test_runner/test.js Outdated
publishError(err);
}
if (isTestFailureError(err)) {
if (err.failureType === kTestTimeoutFailure && this.flakyRetries === 0) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout handling: a non-flaky timeout stays cancelled (#cancel), but a flaky one goes through fail() so it can retry and, when the budget is exhausted, counts as a failure rather than a cancellation. The flakyRetries === 0 guard is what preserves the existing non-flaky behavior.

Comment thread lib/internal/test_runner/test.js Outdated
this.expectedAssertions = plan;
this.cancelled = false;
if (flaky === undefined || flaky === null) {
this.flakyRetries = this.parent?.flakyRetries ?? 0;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantics per the proposal: nearest-wins inheritance (this parent fallback), flaky: true → 20 retries (kDefaultFlakyRetries, L106; set just below), an explicit positive integer is used as-is, and 0/negative/non-integer values throw via validateInteger. Confirming these defaults match the proposal's intent would help.

@Han5991 Han5991 force-pushed the feat/test-runner-flaky branch from 8ec070e to b9fa345 Compare May 30, 2026 15:19
@Han5991 Han5991 marked this pull request as ready for review May 30, 2026 15:19
@Han5991 Han5991 force-pushed the feat/test-runner-flaky branch from b9fa345 to 454ccc0 Compare May 30, 2026 15:20
@codecov

codecov Bot commented May 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.72937% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.35%. Comparing base (30a7e28) to head (901c387).
⚠️ Report is 41 commits behind head on main.

Files with missing lines Patch % Lines
lib/internal/test_runner/reporter/junit.js 9.09% 10 Missing ⚠️
lib/internal/test_runner/test.js 97.57% 4 Missing and 2 partials ⚠️
lib/internal/test_runner/reporter/utils.js 86.66% 1 Missing and 1 partial ⚠️
lib/internal/test_runner/reporter/dot.js 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #63661      +/-   ##
==========================================
- Coverage   91.95%   90.35%   -1.61%     
==========================================
  Files         379      732     +353     
  Lines      166454   236910   +70456     
  Branches    25427    44631   +19204     
==========================================
+ Hits       153058   214050   +60992     
- Misses      13104    14574    +1470     
- Partials      292     8286    +7994     
Files with missing lines Coverage Δ
lib/internal/test_runner/harness.js 88.15% <100.00%> (+0.02%) ⬆️
lib/internal/test_runner/reporter/tap.js 94.96% <100.00%> (+0.08%) ⬆️
lib/internal/test_runner/runner.js 94.51% <100.00%> (+0.39%) ⬆️
lib/internal/test_runner/tests_stream.js 92.64% <100.00%> (+0.14%) ⬆️
lib/internal/test_runner/utils.js 65.37% <100.00%> (+0.13%) ⬆️
lib/internal/test_runner/reporter/dot.js 97.72% <80.00%> (-2.28%) ⬇️
lib/internal/test_runner/reporter/utils.js 93.33% <86.66%> (-1.95%) ⬇️
lib/internal/test_runner/test.js 97.77% <97.57%> (-0.09%) ⬇️
lib/internal/test_runner/reporter/junit.js 89.08% <9.09%> (-5.40%) ⬇️

... and 494 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Han5991 Han5991 force-pushed the feat/test-runner-flaky branch from 3ff2a6b to a1989d8 Compare May 30, 2026 16:59
@JakobJingleheimer JakobJingleheimer added the test_runner Issues and PRs related to the test runner subsystem. label Jun 17, 2026
Comment thread lib/internal/test_runner/reporter/dot.js

@JakobJingleheimer JakobJingleheimer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

A small thing (I think it wouldn't be much work to adjust for this): should it be tries instead of retries? Usually, a test is run N times inclusive of the initial. I think that would also simplify the implementation a bit too.

Comment thread lib/internal/test_runner/reporter/junit.js Outdated
Comment thread lib/internal/test_runner/reporter/junit.js Outdated
Comment thread lib/internal/test_runner/reporter/tap.js Outdated
Comment thread lib/internal/test_runner/reporter/tap.js Outdated
Comment thread lib/internal/test_runner/harness.js Outdated
Comment thread lib/internal/test_runner/test.js Outdated
Comment thread lib/internal/test_runner/test.js Outdated
Comment thread lib/internal/test_runner/test.js Outdated
Comment on lines +1854 to +1863
#countCompleted() {
let ancestor = this.parent;
while (ancestor !== null) {
if (ancestor.isFlaky && ancestor.endTime === null) {
ancestor.pendingCountedSubtests ??= [];
ArrayPrototypePush(ancestor.pendingCountedSubtests, this);
return;
}
ancestor = ancestor.parent;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do what it says, and I don't love that it creates side-effects (specifically via ArrayPrototypePush(ancestor.pendingCountedSubtests, this)).

I think a do...while is better suited here because we know the first time it's called that ancestor won't be empty.

Suggested change
#countCompleted() {
let ancestor = this.parent;
while (ancestor !== null) {
if (ancestor.isFlaky && ancestor.endTime === null) {
ancestor.pendingCountedSubtests ??= [];
ArrayPrototypePush(ancestor.pendingCountedSubtests, this);
return;
}
ancestor = ancestor.parent;
}
/**
* Climb up the tree, checking for pending tests.
*/
#collectPendingToRoot() {
let ancestor = this.parent;
do {
if (ancestor.isFlaky && ancestor.endTime === null) {
ancestor.pendingCountedSubtests ??= [];
ArrayPrototypePush(ancestor.pendingCountedSubtests, this);
return;
}
ancestor = ancestor.parent;
}
while (ancestor !== null);

Should the return be a break? Otherwise countCompletedTest never gets called, which it seems it should.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean toward keeping the current shape:

  • return (not break) is intentional: a subtest parked on a still-running flaky ancestor is re-counted when that ancestor reports (report() replays the parked list via pending[i].#countCompleted()). break would count it now and again on replay → double-counted # tests.
  • while (not do...while): the root test reaches #countCompleted() with parent === null, where while skips the body and falls through to countCompletedTest(this). do...while would run the body once with ancestor === null and throw on ancestor.isFlaky.
  • The ArrayPrototypePush is the mechanism (parking the count until the ancestor settles), not an incidental side effect.

If the comment reads as "doesn't do what it says," I'm happy to reword it to make the park-then-replay flow explicit — let me know which part was unclear.

Comment thread lib/internal/test_runner/test.js Outdated

@JakobJingleheimer JakobJingleheimer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

A small thing (I think it wouldn't be much work to adjust for this): should it be tries instead of retries? Usually, a test is run N times inclusive of the initial. I think that would also simplify the implementation a bit too.

@Han5991

Han5991 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

should it be tries instead of retries? Usually, a test is run N times inclusive of the initial.

I went with retries-semantics (the number is additional attempts after the first run, so flaky: 3 → up to 4 runs) on purpose — it's what every major JS test runner does:

  • JestretryTimes(n): "Runs failed tests n-times until they pass"
  • Vitestretry: "Number of times to retry a test if it fails" (default 0)
  • Playwrightretries: 3: "Give failing tests 3 retry attempts"
  • Cypressretries: 2: "up to 2 additional times (for a total of 3 attempts)"
  • Mochathis.retries(4): retries a failed test up to 4 more times

Inclusive/total counts do exist, but they pair with names like attempts / invocationCount / @RepeatedTest, never with retry. Since this is a retry-on-failure feature (flakyRetries, retryCount in the reporter), retries-semantics matches the naming. Inclusive would also make flaky: 1 a valid no-op (1 run = 0 retries).

On the implementation side, inclusive only removes the + 1 in maxAttempts = flakyRetries + 1 — the loop and reporting are already attempt-based, so it isn't really a simplification. The docs already spell out the total ("1 initial run + 20 retries"), so the semantics is documented at the call site.

Happy to revisit if you feel strongly about inclusive counts.

Han5991 and others added 6 commits June 18, 2026 07:09
Add a `flaky` option that re-runs a failing test until it passes,
intended for tests with unavoidable nondeterminism. Setting
`flaky: true` retries up to 20 times; `flaky: <positive integer>`
sets an explicit retry budget. The option is accepted on tests and
suites and via the it.flaky/test.flaky/describe.flaky/suite.flaky
shorthands; a test-case value overrides an inherited suite value
(nearest wins), and `flaky: false` opts a test out.

Only the final attempt is observable: intermediate failures emit no
test:fail, no per-test diagnostics, and nothing on the node.test
error channel. Each result carries a new `retryCount` field on the
test:pass, test:fail, and test:complete events (the number of retries
performed, `undefined` for non-flaky tests), reporters print a
`# FLAKY` directive, and the run summary gains a `flaky` counter.

beforeEach/afterEach re-run on every attempt while before/after run
once, so per-attempt state is reset and retries do not leak state.
An externally aborted test and an expectFailure are not retried. A
flaky test whose timeout is exhausted is reported as a failure rather
than cancelled.

Co-authored-by: vespa7 <98526766+vespa7@users.noreply.github.com>
Signed-off-by: sangwook <rewq5991@gmail.com>
Add failing regression tests for nine defects found while reviewing the
flaky feature, each asserting the corrected behavior:

- a flaky + expectFailure test that unexpectedly passes must not retry
- a flaky parent must retry when only a subtest fails
- node.test tracing start/end must stay balanced across retries
- flaky must not propagate from a test to its own subtests
- the MockTracker must be reset between retries
- a test-level after hook must run after the final attempt
- a retried attempt must abort the previous attempt's signal
- an in-flight subtest from a discarded attempt must not corrupt the run
- a flaky + expectFailure error must still reach the node.test channel

Signed-off-by: sangwook <rewq5991@gmail.com>
Fix eight defects in the flaky retry loop, each covered by a regression
test added in the previous commit:

- do not retry a flaky expectFailure test that unexpectedly passes
- retry a flaky parent when a failure surfaces only through a subtest
- publish the tracing channel end per attempt so spans stay balanced
- inherit flaky only from a suite parent, not a test to its subtests
- reset the MockTracker between retries
- run a test-level after hook only after the final attempt
- abort the previous attempt's signal so its work is torn down
- publish a flaky expectFailure throw's error to the node.test channel

The remaining intermediate-attempt subtest output inflation across
retries is the documented known limitation and is unchanged.

Signed-off-by: sangwook <rewq5991@gmail.com>
Tidy the flaky retry support added earlier on this branch without
changing its behavior:

- Extract the per-attempt reset and the retry-eligibility check into
  #resetForRetry() and #retriesRemaining(), and recreate afterEach
  through a single factory, so run()'s retry loop reads as reset,
  attempt, then decide.
- Derive an isFlaky flag and use it consistently in the constructor,
  the completed-test counter, and the run() process-isolation parent
  reconstruction instead of overloading flakyRetries.
- Share the reporter "# FLAKY" directive through flakyDirective() and
  formatRetryCount() helpers used by the tap, spec, and junit reporters.

Signed-off-by: sangwook <rewq5991@gmail.com>
Fix issues found while reviewing the flaky retry implementation:

- park subtest completion counts while a flaky ancestor can still
  retry, so a discarded attempt's failing subtest no longer flips
  harness.success and fails the run after a successful retry
- restart subtest reportOrder bookkeeping on retry so subtests
  created by a retry attempt are finalized and reported
- keep the non-flaky cancellation semantics for a timed-out
  flaky + expectFailure test instead of counting it as a pass
- keep the test-level after hook for the final attempt when a
  failed subtest triggers the retry, and run deferred final-attempt
  duties when an anticipated retry is called off
- truncate per-attempt hooks registered by the test body so they do
  not accumulate across retries
- scope the run() parent's cancelled-to-failed upgrade to exhausted
  timeouts only
- filter the child process flaky summary diagnostic in the parent
- keep an explicit numeric budget passed alongside the flaky
  shorthand
- rebind tracing publishers from a stable base across attempts
- emit the junit flaky property as a raw retry count

Signed-off-by: sangwook <rewq5991@gmail.com>
Reflect reviewer feedback on the flaky retry change: add JSDoc block comments for IntelliSense, wrap long conditionals and calls across multiple lines, clarify comment wording, and simplify the report directive merge since spreading undefined is a no-op. Alphabetize the diagnostics filter list and de-mangle the TAP and JUnit reporter calls. No behavior change.

Signed-off-by: sangwook <rewq5991@gmail.com>
@Han5991 Han5991 force-pushed the feat/test-runner-flaky branch from 51f6955 to f96a879 Compare June 17, 2026 22:09
Signed-off-by: sangwook <rewq5991@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. test_runner Issues and PRs related to the test runner subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants