Skip to content

test: add fuzz testing infrastructure#720

Draft
raballew wants to merge 116 commits into
jumpstarter-dev:mainfrom
raballew:512-hypothesis-fuzz-testing
Draft

test: add fuzz testing infrastructure#720
raballew wants to merge 116 commits into
jumpstarter-dev:mainfrom
raballew:512-hypothesis-fuzz-testing

Conversation

@raballew
Copy link
Copy Markdown
Member

@raballew raballew commented Jun 1, 2026

Summary

Adds property-based and robustness fuzz testing across the entire jumpstarter codebase using Hypothesis (Python) and Go's native fuzzing. Includes a unified fuzz runner, CI workflow, and regression injection pipeline.

Closes #512

What's included

Fuzz runner (scripts/fuzz.py)

  • Unified CLI dispatching Python (HypoFuzz + Hypothesis loop) and Go (go test -fuzz) targets within a configurable time budget
  • Automatic regression injection: discoveries are replayed and persisted as @example() decorators (Python) or f.Add() seed corpus entries (Go) in committed test source
  • Per-file test execution to isolate failures without blocking the full suite
  • HypoFuzz startup retry logic and robust Hypothesis DB handling

CI workflow (.github/workflows/fuzz.yaml)

  • Runs on push to main (6h budget), PRs touching Python/Go/protocol code (5m), and manual dispatch with configurable duration
  • Go targets run in a matrix with cached fuzz corpus
  • Crash artifacts uploaded on failure

Python test coverage (73 test files)

  • Hypothesis property tests (*_hypothesis_test.py): label selector parsing, OCI credentials, TLS config, CRD schema validation, gRPC protobuf serialization, serde roundtrips, stream encoding, driver decorators, enum roundtrips, condition handling
  • Robustness tests (*_robustness_test.py): every driver package, CLI commands (create/delete/get/update/shell/run/login/auth/config/completion), Kubernetes models, config parsing, protocol layer
  • Deep gap tests: YAML config injection, compression bombs, CLI execution paths, driver method dispatch, CRD CEL expressions, clean error output
  • API surface audit: programmatic public export verification

Go fuzz tests (7 targets)

  • FuzzParseLabelSelector, FuzzReconcileLeaseTimeFields, FuzzValidateLeaseTags
  • FuzzNormalizeOIDCUsername, FuzzBearerTokenExtraction
  • FuzzMatchLabels, FuzzLoadGrpcConfiguration

Bug fix

  • selector_contains was matching labels against requirements incorrectly (fixed in selectors.py)

Findings

A 48-hour local fuzz run found 3 bugs, filed as:

Test plan

  • python -m pytest scripts/fuzz_test.py -- fuzz runner unit tests
  • make fuzz-python FUZZ_TIME=5m -- quick Python fuzz smoke test
  • make fuzz FUZZ_TIME=5m -- full fuzz suite (Python + Go)
  • CI workflow runs on this PR (5m budget)
  • Verify @example() injection works: run fuzz, check git diff for injected decorators

🤖 Generated with Claude Code

raballew and others added 30 commits May 29, 2026 11:01
Add hypothesis as dev dependency and create property-based tests
for TLSConfigV1Alpha1 roundtrip and HookInstanceConfigV1Alpha1
construction with arbitrary valid inputs.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify both-or-neither enforcement, whitespace
normalization, dict roundtrip, and frozen model hashability for
OciCredentials.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify labels preservation, UUID validity,
UUID uniqueness, and the name property lookup behavior for arbitrary
label dictionaries.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify parse roundtrip, selector_contains
reflexivity, superset-subset containment, empty requirement matching,
disjoint label rejection, and in-expression parsing.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify non-matching data returns None,
signature prefix detection for all formats, real compressed data
detection, and compress/decompress roundtrip for gzip, xz, bz2,
and zstd.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify from_proto/to_proto roundtrip, integer
return type, string representation, and value uniqueness for both
ExporterStatus and LogSource enums.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Verifies that all modules with __all__ are importable, exports match
expected symbols, all declared exports are resolvable, and all
submodules in common and config packages are importable.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Fix import ordering and rename unused loop variables to follow
ruff B007 and I001 rules.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Replace `if leaks: pass` no-op with an actual assertion that catches
undeclared public symbols. Use module-aware filtering to distinguish
locally-defined names from imported ones, with per-module allowlists
for intentionally public functions not in __all__.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
The test validates exports across jumpstarter.client, jumpstarter.driver,
jumpstarter.exporter, and jumpstarter.common, so it belongs at the
package root rather than inside common/.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
TestCompressionRoundtrip only tested gzip/bz2/lzma/zstd stdlib functions
without exercising project code. Replaced with TestCreateDecompressorRoundtrip
that verifies the project's create_decompressor function produces working
decompressor objects for all supported compression formats.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Scan all packages for import sites of each exported symbol and flag
symbols with zero external imports. Symbols intentionally re-exported
for external consumers are tracked in an allowlist.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
- F005: add discovery test for modules defining __all__ not tracked
- F006: split models_hypothesis_test.py into tls_hypothesis_test.py
  and exporter_hypothesis_test.py to match {module}_test.py convention
- F010: add tests for !=, notin, exists, !exists selector operators
- F011: replace hypothesis with pytest.mark.parametrize for finite enums
- F012: replace tautological test_value_equals_proto with assertions
  against actual protobuf constants from common_pb2
- F013: add return type annotation to label_pairs_strategy
- F014: add type annotations to module-level strategy constants
- F015: use Literal type for on_failure parameter
- F017: add tests for extract_match_labels_filter function
- F018: assert hash consistency in test_frozen_model_is_hashable

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
The function only handled ast.ImportFrom nodes, missing direct
import statements (e.g., import X.Y.Z). Now also tracks attribute
access on directly imported modules to avoid false positives in
the zero-usage report.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The startswith(module_name) check would incorrectly classify
symbols from jumpstarter.commonx as local to jumpstarter.common.
Use exact match or dot-suffixed prefix check instead.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both selectors_hypothesis_test and metadata_hypothesis_test defined
independent label key/value strategies with different constraints.
Consolidate into a single testing_strategies module to ensure
consistent generation across test files.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add TestParseOciRegistry class with hypothesis-driven tests covering
explicit registry extraction, oci:// scheme stripping, port
preservation, and idempotency between plain and prefixed URLs.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add hypothesis-driven tests for selector_contains with in, notin,
exists, and !exists expressions, verifying reflexivity and
operator mismatch behavior.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply ruff isort fixes to selectors_hypothesis_test.py and
metadata_hypothesis_test.py after shared strategy extraction.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract _collect_imported_names_from_tree helper for testability and add
tests verifying that dotted imports, aliased imports, and from-imports
all resolve correctly through attribute chain walking.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…ents

A key=value label selector now correctly satisfies expression-based
requirements like key in (value), key!=other, key notin (other), and
key exists. Extracted _label_satisfies_expression to keep complexity
within ruff C901 limits.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…CKAGES

Replace hardcoded common and config submodule importability tests with a
single parametrized test covering all four tracked packages (common,
client, driver, exporter).

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…ted_names

Add test verifying that assignment-based import aliasing (e.g.
`m = jumpstarter.common; m.Metadata`) is a known untracked pattern,
documenting this as an accepted limitation given the codebase convention
of using `from X import Y` style imports.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…tials

Cover all four public symbols in oci.py's __all__ with tests.
TestReadAuthFileCredentials verifies auth file reading, malformed file
handling, and registry mismatch. TestResolveOciCredentials verifies the
three-level credential precedence (explicit, env vars, auth file) and
partial credential rejection.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Add hypothesis tests verifying that label selectors (key=value) satisfy
expression-based requirements: exists, !=, and notin operators. This
complements the existing in-operator cross-type test to cover all
branches in _label_satisfies_expression.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Include jumpstarter.config in the tracked packages list so its
submodules are tested for importability and any __all__-defining modules
would be discovered by TestModulesWithAllDiscovery.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…xists

Adds _collect_assignment_aliases helper and three tests that verify:
1. The helper detects assignment-based aliasing patterns
2. The helper ignores untracked modules
3. No production files use assignment-based aliasing of tracked modules

This converts the _collect_imported_names_from_tree limitation from a
theoretical concern into a provably non-impactful one, as the codebase
exclusively uses "from X import Y" style imports.

Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
raballew and others added 25 commits June 1, 2026 12:55
Require explicit h/m/s suffixes in the proper order and reject
empty input, preventing bare numbers or malformed durations from
being silently accepted.

Generated-By: Forge/20260601_123354_597814_b463be94
These are fuzz runner dependencies used by scripts/fuzz.py, not
package-level test dependencies. Move them to the workspace root
dev group and standardize individual packages to use plain
hypothesis>=6.127.2.

Generated-By: Forge/20260601_123354_597814_b463be94
Run extended fuzzing on a nightly schedule (2h) instead of on every
push to main (6h), reducing CI resource usage while still catching
regressions. Lower timeout-minutes from 370 to 150 accordingly.

Generated-By: Forge/20260601_123354_597814_b463be94
Verify that _insert_example preserves existing comments and whitespace
during source file modification, and that unsafe AST nodes are rejected.

Generated-By: Forge/20260601_123354_597814_b463be94
Remove hypothesis from packages that have no hypothesis_test.py or
robustness_test.py files, reducing unnecessary dependency sprawl.

Generated-By: Forge/20260601_123354_597814_b463be94
… files

These files had only trailing-comma and bracket-style changes with no
hypothesis-related content, inflating the diff from ~97 meaningful
files to 122 and deviating from the project TOML style.

Generated-By: Forge/20260601_123354_597814_b463be94
The notin and != operators now return True when the key is absent from
labels, matching Kubernetes behavior where a missing key vacuously
satisfies negative constraints. The !exists and exists operators now
check both matchLabels and matchExpressions to correctly handle cases
where a key appears only in expressions.

Generated-By: Forge/20260601_142034_798905_c7a34b73
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add constructor robustness test entries for the 23 previously missing
driver packages including androidemulator, can, composite, corellium,
doip, dut_network, flashers, gpiod, http_power, mitmproxy, network,
noyito_relay, opendal, power, qemu, ridesx, someip, uboot, uds,
uds_can, uds_doip, ustreamer, and xcp.

Generated-By: Forge/20260601_142034_798905_c7a34b73
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…den falsifying examples regex

parse_duration now rejects inputs like "5m30" where a bare number
follows a unit suffix, preventing the ambiguous interpretation as
330 seconds. The _extract_falsifying_examples regex now allows
trailing whitespace after the closing paren, and skips examples
with empty cleaned args.

Generated-By: Forge/20260601_142034_798905_c7a34b73
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ly returns

Use early returns to prevent overlapping assertion conditions in
FuzzValidateLeaseTags. Keys with jumpstarter.dev/ also contain /,
so checking the more specific prefix first and returning prevents
the generic slash check from masking which rule triggered rejection.

Generated-By: Forge/20260601_142034_798905_c7a34b73
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply ruff import ordering fixes to move third-party imports after
relative imports in 4 robustness test files. Sync uv.lock to reflect
workspace dev dependency changes from prior commit.

Generated-By: Forge/20260601_142034_798905_c7a34b73
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three CRD test files import jsonschema at module level but the package
was not declared in dev dependencies, causing ModuleNotFoundError
during pytest collection and blocking the entire test suite.

Generated-By: Forge/20260601_142034_798905_c7a34b73

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This test imports from jumpstarter_cli.common and jumpstarter_cli_common.opt
but was placed in the core jumpstarter package which does not depend on
CLI packages. This caused import collection errors during pytest.

Generated-By: Forge/20260601_142034_798905_c7a34b73

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Centralizing these settings in the ci and fuzz profiles eliminates
the need for per-test @settings overrides and prevents flaky CI
failures on slow runners.

Generated-By: Forge/20260601_142034_798905_c7a34b73

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With deadline=None and suppress_health_check set in the ci and fuzz
Hypothesis profiles, per-test @settings(deadline=None) decorators are
redundant. Removing them reduces boilerplate and ensures consistency.

Generated-By: Forge/20260601_142034_798905_c7a34b73

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All driver and CLI robustness tests duplicated an identical ARBITRARY
strategy definition. Replaced with import from testing_strategies.py
to prevent drift. Three files with domain-specific customizations
(crd_robustness_test, serde_robustness_test, kubernetes) are left
as-is since their constraints differ intentionally.

Generated-By: Forge/20260601_142034_798905_c7a34b73

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g semantics

The function performs Kubernetes label satisfaction checks where negative
operators (!=, notin, !exists) are satisfied when the key is absent, but
the docstring incorrectly described it as a containment check.

Generated-By: Forge/20260601_152609_887943_15050c4c
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --go-only is used and no Go fuzz targets exist, the division by
len(targets) would crash. Add an early return with a message instead.

Generated-By: Forge/20260601_152609_887943_15050c4c
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ching

Remove re.MULTILINE from the example regex so that closing-paren matches
require a true newline or end-of-string rather than any end-of-line
anchor, preventing premature matches on embedded closing parens in
multi-line falsifying examples.

Generated-By: Forge/20260601_152609_887943_15050c4c
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hardcoded parents[4] path traversal breaks when the file moves to a
different directory depth. Walk upward to find the repository root by
looking for both controller/ and python/ directories instead.

Generated-By: Forge/20260601_152609_887943_15050c4c
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
L1: Improve parse_duration error message to suggest explicit units.
L2: Extract startup_grace into HYPOFUZZ_STARTUP_GRACE_SECONDS constant.
L4: Use word-boundary regex for hypothesis example import detection.
L6: Tighten CI fuzz_time validation regex to require at least one component.
L9: Raise jsonschema lower bound from 4.0.0 to 4.17.0.

Generated-By: Forge/20260601_152609_887943_15050c4c
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add push trigger for main branch with 6h fuzz budget per FR-008.
Compute timeout-minutes dynamically from fuzz_time instead of
hardcoding 150 minutes, so workflow_dispatch with long durations
will not be killed prematurely.

Generated-By: Forge/20260601_152609_887943_15050c4c
Add explicit tests verifying that a selector without key K satisfies
!exists, notin, and != requirements for that key, matching Kubernetes
label selector semantics for negative operators on absent labels.

Generated-By: Forge/20260601_152609_887943_15050c4c
Add CLI flag to override the default limit of 1 regression example per
test function. The default preserves the existing conservative behavior
where only the first failure is kept per test, but users can now raise
the limit when investigating multiple distinct failure modes.

Generated-By: Forge/20260601_152609_887943_15050c4c
…lity

click.exceptions.BadParameter etc. are not resolvable by ty as
submodule attributes; use click.BadParameter which is the public API.
Also wrap a long line in selectors_hypothesis_test.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
raballew and others added 4 commits June 1, 2026 21:20
Previously run_go_all and the main dispatch loop would bail out on the
first Go fuzzer crash, skipping all remaining targets. Now all targets
run to completion and failures are reported at the end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same fix as the previous commit, applied to cli-admin, cli-common, and
cli-driver packages. Also cast arbitrary-typed args to Any to satisfy
ty's invalid-argument-type checks in cli-common robustness tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Satisfies staticcheck QF1001.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Robustness tests intentionally pass object-typed values to functions
with specific type signatures. Wrap these calls with cast(Any, ...) so
the ty type checker does not report invalid-argument-type errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Jumpstarter status when Ingress API is unavailable

1 participant