test_runner: add flaky option to retry on failure by Han5991 · Pull Request #63661 · nodejs/node

Han5991 · 2026-05-30T15:15:37Z

This feature is based on the flaky proposal in the nodejs/test-runner repo, which I used as the reference for the API surface and semantics. A test can be marked flaky so the runner re-runs it until it passes — useful for tests with unavoidable nondeterminism.

Supersedes #61746 (original work by @vespa7, credited as co-author); this is a fresh implementation on current main.

Closes #61746

What it adds

flaky: true → retry up to 20 times; flaky: <positive integer> → explicit budget; flaky: false → off. Invalid values throw at registration.
Works on tests and suites, plus the it.flaky / test.flaky / describe.flaky / suite.flaky shorthands. Nearest value wins (a test-case overrides its suite).
Only the final attempt is observable: intermediate failures emit no test:fail, no per-test diagnostics, and nothing on the node.test error channel.
New retryCount field on test:pass / test:fail / test:complete; reporters print a # FLAKY … directive; the run summary gains a flaky counter.
beforeEach/afterEach re-run per attempt (state reset between retries); before/after run once. External abort and expectFailure are not retried.

Points I'd appreciate eyes on

I've left inline review comments on the specific lines that would most benefit from review — the flaky-scoped assertion-plan read, the subtest crash fix and its known limitation, and the run() process-isolation parent re-counting.

Testing

New test/parallel/test-runner-flaky.js with deterministic, counter-based fixtures, plus a reporter snapshot test. Full test_runner suite passes; lint clean.

nodejs-github-bot · 2026-05-30T15:15:42Z

Review requested:

@nodejs/test_runner

Han5991

A few self-review notes on the trickier parts, anchored to the exact lines.

Han5991 · 2026-05-30T15:16:09Z

+      // (at first `.assert` access), preserving the historical counting behavior
+      // exactly - opting out of `flaky` changes nothing.
+      const capturedPlan = test.plan;
+      const currentPlan = () => (test.flakyRetries > 0 ? test.plan : capturedPlan);


This is the one change that touches the non-flaky path: the assertion plan is now read at use-time, but only for flaky tests (test.flakyRetries > 0); non-flaky tests keep the registration-time capture (capturedPlan). Please confirm non-flaky plan counting is unchanged — the regression guard is the assert-before-plan case in test/parallel/test-runner-flaky.js.

Han5991 · 2026-05-30T15:16:09Z

+        // final attempt creates no subtest - leaving outputSubtestCount > 0 with
+        // an empty `subtests`, which crashed report() (see the guard there).
+        this.subtests = [];
+        this.outputSubtestCount = 0;


Subtest crash fix (1/2): on retry we clear subtests and reset outputSubtestCount. Without the reset the count stayed > 0 while subtests was emptied, which crashed the reporter (see the guard in 2/2).

Han5991 · 2026-05-30T15:16:09Z

+      // `subtests` can be empty here when a flaky retry cleared it even though
+      // outputSubtestCount was non-zero; fall back to this test's own nesting
+      // rather than dereferencing subtests[0] (which would crash the run).
+      const subtestNesting = this.subtests.length ?


Subtest crash fix (2/2) + known limitation: this guards the empty-subtests dereference that aborted the run. Known limitation: intermediate-attempt subtests are emitted to the stream before the retry decision, so they cannot be recalled — the output can still contain leaked subtest lines and a slightly inflated # tests. Full fidelity would need buffering each attempt's events and replaying only the surviving one. Is the crash-fix-only version OK to land, with fidelity as a follow-up?

Han5991 · 2026-05-30T15:16:09Z

-        cancelled: kCanceledTests.has(item.data.details?.error?.failureType),
+        // A retried timeout that exhausted (retryCount > 0) is a failure, not a
+        // cancellation only an un-retried timeout/abort stays cancelled.
+        cancelled: kCanceledTests.has(item.data.details?.error?.failureType) &&


run() process-isolation parent re-counts the child's serialized events, so two things are handled here: an exhausted flaky timeout (retryCount > 0) is promoted out of cancelled into a failure, and any flaky-marked test (retryCount present, even 0) counts toward flaky. This cross-process re-derivation is easy to get subtly wrong — please sanity-check.

Han5991 · 2026-05-30T15:16:09Z

+          publishError(err);
+        }
+        if (isTestFailureError(err)) {
+          if (err.failureType === kTestTimeoutFailure && this.flakyRetries === 0) {


Timeout handling: a non-flaky timeout stays cancelled (#cancel), but a flaky one goes through fail() so it can retry and, when the budget is exhausted, counts as a failure rather than a cancellation. The flakyRetries === 0 guard is what preserves the existing non-flaky behavior.

Han5991 · 2026-05-30T15:16:09Z

    this.expectedAssertions = plan;
    this.cancelled = false;
+    if (flaky === undefined || flaky === null) {
+      this.flakyRetries = this.parent?.flakyRetries ?? 0;


Semantics per the proposal: nearest-wins inheritance (this parent fallback), flaky: true → 20 retries (kDefaultFlakyRetries, L106; set just below), an explicit positive integer is used as-is, and 0/negative/non-integer values throw via validateInteger. Confirming these defaults match the proposal's intent would help.

codecov · 2026-05-30T16:44:16Z

Codecov Report

❌ Patch coverage is 93.72937% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.35%. Comparing base (30a7e28) to head (901c387).
⚠️ Report is 41 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/internal/test_runner/reporter/junit.js	9.09%	10 Missing ⚠️
lib/internal/test_runner/test.js	97.57%	4 Missing and 2 partials ⚠️
lib/internal/test_runner/reporter/utils.js	86.66%	1 Missing and 1 partial ⚠️
lib/internal/test_runner/reporter/dot.js	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #63661      +/-   ##
==========================================
- Coverage   91.95%   90.35%   -1.61%     
==========================================
  Files         379      732     +353     
  Lines      166454   236910   +70456     
  Branches    25427    44631   +19204     
==========================================
+ Hits       153058   214050   +60992     
- Misses      13104    14574    +1470     
- Partials      292     8286    +7994

Files with missing lines	Coverage Δ
lib/internal/test_runner/harness.js	`88.15% <100.00%> (+0.02%)`	⬆️
lib/internal/test_runner/reporter/tap.js	`94.96% <100.00%> (+0.08%)`	⬆️
lib/internal/test_runner/runner.js	`94.51% <100.00%> (+0.39%)`	⬆️
lib/internal/test_runner/tests_stream.js	`92.64% <100.00%> (+0.14%)`	⬆️
lib/internal/test_runner/utils.js	`65.37% <100.00%> (+0.13%)`	⬆️
lib/internal/test_runner/reporter/dot.js	`97.72% <80.00%> (-2.28%)`	⬇️
lib/internal/test_runner/reporter/utils.js	`93.33% <86.66%> (-1.95%)`	⬇️
lib/internal/test_runner/test.js	`97.77% <97.57%> (-0.09%)`	⬇️
lib/internal/test_runner/reporter/junit.js	`89.08% <9.09%> (-5.40%)`	⬇️

... and 494 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

JakobJingleheimer

Thanks for this!

A small thing (I think it wouldn't be much work to adjust for this): should it be tries instead of retries? Usually, a test is run N times inclusive of the initial. I think that would also simplify the implementation a bit too.

JakobJingleheimer · 2026-06-17T14:20:12Z

+  #countCompleted() {
+    let ancestor = this.parent;
+    while (ancestor !== null) {
+      if (ancestor.isFlaky && ancestor.endTime === null) {
+        ancestor.pendingCountedSubtests ??= [];
+        ArrayPrototypePush(ancestor.pendingCountedSubtests, this);
+        return;
+      }
+      ancestor = ancestor.parent;
+    }


This doesn't do what it says, and I don't love that it creates side-effects (specifically via ArrayPrototypePush(ancestor.pendingCountedSubtests, this)).

I think a do...while is better suited here because we know the first time it's called that ancestor won't be empty.

Suggested change

#countCompleted() {

let ancestor = this.parent;

while (ancestor !== null) {

if (ancestor.isFlaky && ancestor.endTime === null) {

ancestor.pendingCountedSubtests ??= [];

ArrayPrototypePush(ancestor.pendingCountedSubtests, this);

return;

}

ancestor = ancestor.parent;

}

/**

* Climb up the tree, checking for pending tests.

*/

#collectPendingToRoot() {

let ancestor = this.parent;

do {

if (ancestor.isFlaky && ancestor.endTime === null) {

ancestor.pendingCountedSubtests ??= [];

ArrayPrototypePush(ancestor.pendingCountedSubtests, this);

return;

}

ancestor = ancestor.parent;

}

while (ancestor !== null);

Should the return be a break? Otherwise countCompletedTest never gets called, which it seems it should.

I'd lean toward keeping the current shape:

return (not break) is intentional: a subtest parked on a still-running flaky ancestor is re-counted when that ancestor reports (report() replays the parked list via pending[i].#countCompleted()). break would count it now and again on replay → double-counted # tests.

while (not do...while): the root test reaches #countCompleted() with parent === null, where while skips the body and falls through to countCompletedTest(this). do...while would run the body once with ancestor === null and throw on ancestor.isFlaky.

The ArrayPrototypePush is the mechanism (parking the count until the ancestor settles), not an incidental side effect.

If the comment reads as "doesn't do what it says," I'm happy to reword it to make the park-then-replay flow explicit — let me know which part was unclear.

JakobJingleheimer

Thanks for this!

A small thing (I think it wouldn't be much work to adjust for this): should it be tries instead of retries? Usually, a test is run N times inclusive of the initial. I think that would also simplify the implementation a bit too.

Han5991 · 2026-06-17T16:32:41Z

should it be tries instead of retries? Usually, a test is run N times inclusive of the initial.

I went with retries-semantics (the number is additional attempts after the first run, so flaky: 3 → up to 4 runs) on purpose — it's what every major JS test runner does:

Jest — retryTimes(n): "Runs failed tests n-times until they pass"
Vitest — retry: "Number of times to retry a test if it fails" (default 0)
Playwright — retries: 3: "Give failing tests 3 retry attempts"
Cypress — retries: 2: "up to 2 additional times (for a total of 3 attempts)"
Mocha — this.retries(4): retries a failed test up to 4 more times

Inclusive/total counts do exist, but they pair with names like attempts / invocationCount / @RepeatedTest, never with retry. Since this is a retry-on-failure feature (flakyRetries, retryCount in the reporter), retries-semantics matches the naming. Inclusive would also make flaky: 1 a valid no-op (1 run = 0 retries).

On the implementation side, inclusive only removes the + 1 in maxAttempts = flakyRetries + 1 — the loop and reporting are already attempt-based, so it isn't really a simplification. The docs already spell out the total ("1 initial run + 20 retries"), so the semantics is documented at the call site.

Happy to revisit if you feel strongly about inclusive counts.

Add a `flaky` option that re-runs a failing test until it passes, intended for tests with unavoidable nondeterminism. Setting `flaky: true` retries up to 20 times; `flaky: <positive integer>` sets an explicit retry budget. The option is accepted on tests and suites and via the it.flaky/test.flaky/describe.flaky/suite.flaky shorthands; a test-case value overrides an inherited suite value (nearest wins), and `flaky: false` opts a test out. Only the final attempt is observable: intermediate failures emit no test:fail, no per-test diagnostics, and nothing on the node.test error channel. Each result carries a new `retryCount` field on the test:pass, test:fail, and test:complete events (the number of retries performed, `undefined` for non-flaky tests), reporters print a `# FLAKY` directive, and the run summary gains a `flaky` counter. beforeEach/afterEach re-run on every attempt while before/after run once, so per-attempt state is reset and retries do not leak state. An externally aborted test and an expectFailure are not retried. A flaky test whose timeout is exhausted is reported as a failure rather than cancelled. Co-authored-by: vespa7 <98526766+vespa7@users.noreply.github.com> Signed-off-by: sangwook <rewq5991@gmail.com>

Add failing regression tests for nine defects found while reviewing the flaky feature, each asserting the corrected behavior: - a flaky + expectFailure test that unexpectedly passes must not retry - a flaky parent must retry when only a subtest fails - node.test tracing start/end must stay balanced across retries - flaky must not propagate from a test to its own subtests - the MockTracker must be reset between retries - a test-level after hook must run after the final attempt - a retried attempt must abort the previous attempt's signal - an in-flight subtest from a discarded attempt must not corrupt the run - a flaky + expectFailure error must still reach the node.test channel Signed-off-by: sangwook <rewq5991@gmail.com>

Fix eight defects in the flaky retry loop, each covered by a regression test added in the previous commit: - do not retry a flaky expectFailure test that unexpectedly passes - retry a flaky parent when a failure surfaces only through a subtest - publish the tracing channel end per attempt so spans stay balanced - inherit flaky only from a suite parent, not a test to its subtests - reset the MockTracker between retries - run a test-level after hook only after the final attempt - abort the previous attempt's signal so its work is torn down - publish a flaky expectFailure throw's error to the node.test channel The remaining intermediate-attempt subtest output inflation across retries is the documented known limitation and is unchanged. Signed-off-by: sangwook <rewq5991@gmail.com>

Tidy the flaky retry support added earlier on this branch without changing its behavior: - Extract the per-attempt reset and the retry-eligibility check into #resetForRetry() and #retriesRemaining(), and recreate afterEach through a single factory, so run()'s retry loop reads as reset, attempt, then decide. - Derive an isFlaky flag and use it consistently in the constructor, the completed-test counter, and the run() process-isolation parent reconstruction instead of overloading flakyRetries. - Share the reporter "# FLAKY" directive through flakyDirective() and formatRetryCount() helpers used by the tap, spec, and junit reporters. Signed-off-by: sangwook <rewq5991@gmail.com>

Fix issues found while reviewing the flaky retry implementation: - park subtest completion counts while a flaky ancestor can still retry, so a discarded attempt's failing subtest no longer flips harness.success and fails the run after a successful retry - restart subtest reportOrder bookkeeping on retry so subtests created by a retry attempt are finalized and reported - keep the non-flaky cancellation semantics for a timed-out flaky + expectFailure test instead of counting it as a pass - keep the test-level after hook for the final attempt when a failed subtest triggers the retry, and run deferred final-attempt duties when an anticipated retry is called off - truncate per-attempt hooks registered by the test body so they do not accumulate across retries - scope the run() parent's cancelled-to-failed upgrade to exhausted timeouts only - filter the child process flaky summary diagnostic in the parent - keep an explicit numeric budget passed alongside the flaky shorthand - rebind tracing publishers from a stable base across attempts - emit the junit flaky property as a raw retry count Signed-off-by: sangwook <rewq5991@gmail.com>

Reflect reviewer feedback on the flaky retry change: add JSDoc block comments for IntelliSense, wrap long conditionals and calls across multiple lines, clarify comment wording, and simplify the report directive merge since spreading undefined is a no-op. Alphabetize the diagnostics filter list and de-mangle the TAP and JUnit reporter calls. No behavior change. Signed-off-by: sangwook <rewq5991@gmail.com>

Signed-off-by: sangwook <rewq5991@gmail.com>

nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 30, 2026

Han5991 commented May 30, 2026

View reviewed changes

Han5991 force-pushed the feat/test-runner-flaky branch from 8ec070e to b9fa345 Compare May 30, 2026 15:19

Han5991 marked this pull request as ready for review May 30, 2026 15:19

Han5991 force-pushed the feat/test-runner-flaky branch from b9fa345 to 454ccc0 Compare May 30, 2026 15:20

Han5991 force-pushed the feat/test-runner-flaky branch from 3ff2a6b to a1989d8 Compare May 30, 2026 16:59

Han5991 mentioned this pull request May 31, 2026

feat: first commit flaky #61746

Open

JakobJingleheimer added the test_runner Issues and PRs related to the test runner subsystem. label Jun 17, 2026

JakobJingleheimer reviewed Jun 17, 2026

View reviewed changes

Comment thread lib/internal/test_runner/reporter/dot.js

JakobJingleheimer reviewed Jun 17, 2026

View reviewed changes

Han5991 and others added 6 commits June 18, 2026 07:09

Han5991 force-pushed the feat/test-runner-flaky branch from 51f6955 to f96a879 Compare June 17, 2026 22:09

test_runner: re-run CI (unrelated flaky failures)

bf996a8

Signed-off-by: sangwook <rewq5991@gmail.com>

Uh oh!

Conversation

Han5991 commented May 30, 2026 • edited by JakobJingleheimer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it adds

Points I'd appreciate eyes on

Testing

Uh oh!

nodejs-github-bot commented May 30, 2026

Uh oh!

Han5991 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

JakobJingleheimer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JakobJingleheimer left a comment

Choose a reason for hiding this comment

Uh oh!

Han5991 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Han5991 commented May 30, 2026 •

edited by JakobJingleheimer

Loading

codecov Bot commented May 30, 2026 •

edited

Loading