test: add fuzz testing infrastructure by raballew · Pull Request #720 · jumpstarter-dev/jumpstarter

raballew · 2026-06-01T07:44:18Z

Summary

Adds property-based and robustness fuzz testing across the entire jumpstarter codebase using Hypothesis (Python) and Go's native fuzzing. Includes a unified fuzz runner, CI workflow, and regression injection pipeline.

Closes #512

What's included

Fuzz runner (`scripts/fuzz.py`)

Unified CLI dispatching Python (HypoFuzz + Hypothesis loop) and Go (go test -fuzz) targets within a configurable time budget
Automatic regression injection: discoveries are replayed and persisted as @example() decorators (Python) or f.Add() seed corpus entries (Go) in committed test source
Per-file test execution to isolate failures without blocking the full suite
HypoFuzz startup retry logic and robust Hypothesis DB handling

CI workflow (`.github/workflows/fuzz.yaml`)

Runs on push to main (6h budget), PRs touching Python/Go/protocol code (5m), and manual dispatch with configurable duration
Go targets run in a matrix with cached fuzz corpus
Crash artifacts uploaded on failure

Python test coverage (73 test files)

Hypothesis property tests (*_hypothesis_test.py): label selector parsing, OCI credentials, TLS config, CRD schema validation, gRPC protobuf serialization, serde roundtrips, stream encoding, driver decorators, enum roundtrips, condition handling
Robustness tests (*_robustness_test.py): every driver package, CLI commands (create/delete/get/update/shell/run/login/auth/config/completion), Kubernetes models, config parsing, protocol layer
Deep gap tests: YAML config injection, compression bombs, CLI execution paths, driver method dispatch, CRD CEL expressions, clean error output
API surface audit: programmatic public export verification

Go fuzz tests (7 targets)

FuzzParseLabelSelector, FuzzReconcileLeaseTimeFields, FuzzValidateLeaseTags
FuzzNormalizeOIDCUsername, FuzzBearerTokenExtraction
FuzzMatchLabels, FuzzLoadGrpcConfiguration

Bug fix

selector_contains was matching labels against requirements incorrectly (fixed in selectors.py)

Findings

A 48-hour local fuzz run found 3 bugs, filed as:

DurationParamType.convert raises OverflowError on large numeric strings #717 -- DurationParamType.convert raises OverflowError on large numeric strings
_label_satisfies_expression silently returns False for unknown operators #718 -- _label_satisfies_expression silently returns False for unknown operators
V1Alpha1Lease.from_dict raises AttributeError when spec is not a dict #719 -- V1Alpha1Lease.from_dict raises AttributeError when spec is not a dict

Test plan

python -m pytest scripts/fuzz_test.py -- fuzz runner unit tests
make fuzz-python FUZZ_TIME=5m -- quick Python fuzz smoke test
make fuzz FUZZ_TIME=5m -- full fuzz suite (Python + Go)
CI workflow runs on this PR (5m budget)
Verify @example() injection works: run fuzz, check git diff for injected decorators

🤖 Generated with Claude Code

Add hypothesis as dev dependency and create property-based tests for TLSConfigV1Alpha1 roundtrip and HookInstanceConfigV1Alpha1 construction with arbitrary valid inputs. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Property-based tests verify both-or-neither enforcement, whitespace normalization, dict roundtrip, and frozen model hashability for OciCredentials. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Property-based tests verify labels preservation, UUID validity, UUID uniqueness, and the name property lookup behavior for arbitrary label dictionaries. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Property-based tests verify parse roundtrip, selector_contains reflexivity, superset-subset containment, empty requirement matching, disjoint label rejection, and in-expression parsing. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Property-based tests verify non-matching data returns None, signature prefix detection for all formats, real compressed data detection, and compress/decompress roundtrip for gzip, xz, bz2, and zstd. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Property-based tests verify from_proto/to_proto roundtrip, integer return type, string representation, and value uniqueness for both ExporterStatus and LogSource enums. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Verifies that all modules with __all__ are importable, exports match expected symbols, all declared exports are resolvable, and all submodules in common and config packages are importable. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Fix import ordering and rename unused loop variables to follow ruff B007 and I001 rules. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Replace `if leaks: pass` no-op with an actual assertion that catches undeclared public symbols. Use module-aware filtering to distinguish locally-defined names from imported ones, with per-module allowlists for intentionally public functions not in __all__. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

The test validates exports across jumpstarter.client, jumpstarter.driver, jumpstarter.exporter, and jumpstarter.common, so it belongs at the package root rather than inside common/. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

TestCompressionRoundtrip only tested gzip/bz2/lzma/zstd stdlib functions without exercising project code. Replaced with TestCreateDecompressorRoundtrip that verifies the project's create_decompressor function produces working decompressor objects for all supported compression formats. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Scan all packages for import sites of each exported symbol and flag symbols with zero external imports. Symbols intentionally re-exported for external consumers are tracked in an allowlist. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

- F005: add discovery test for modules defining __all__ not tracked - F006: split models_hypothesis_test.py into tls_hypothesis_test.py and exporter_hypothesis_test.py to match {module}_test.py convention - F010: add tests for !=, notin, exists, !exists selector operators - F011: replace hypothesis with pytest.mark.parametrize for finite enums - F012: replace tautological test_value_equals_proto with assertions against actual protobuf constants from common_pb2 - F013: add return type annotation to label_pairs_strategy - F014: add type annotations to module-level strategy constants - F015: use Literal type for on_failure parameter - F017: add tests for extract_match_labels_filter function - F018: assert hash consistency in test_frozen_model_is_hashable Generated-By: Forge/20260529_105205_1305917_9a1b32f3

The function only handled ast.ImportFrom nodes, missing direct import statements (e.g., import X.Y.Z). Now also tracks attribute access on directly imported modules to avoid false positives in the zero-usage report. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The startswith(module_name) check would incorrectly classify symbols from jumpstarter.commonx as local to jumpstarter.common. Use exact match or dot-suffixed prefix check instead. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Both selectors_hypothesis_test and metadata_hypothesis_test defined independent label key/value strategies with different constraints. Consolidate into a single testing_strategies module to ensure consistent generation across test files. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add TestParseOciRegistry class with hypothesis-driven tests covering explicit registry extraction, oci:// scheme stripping, port preservation, and idempotency between plain and prefixed URLs. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add hypothesis-driven tests for selector_contains with in, notin, exists, and !exists expressions, verifying reflexivity and operator mismatch behavior. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apply ruff isort fixes to selectors_hypothesis_test.py and metadata_hypothesis_test.py after shared strategy extraction. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract _collect_imported_names_from_tree helper for testability and add tests verifying that dotted imports, aliased imports, and from-imports all resolve correctly through attribute chain walking. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

…ents A key=value label selector now correctly satisfies expression-based requirements like key in (value), key!=other, key notin (other), and key exists. Extracted _label_satisfies_expression to keep complexity within ruff C901 limits. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

…CKAGES Replace hardcoded common and config submodule importability tests with a single parametrized test covering all four tracked packages (common, client, driver, exporter). Generated-By: Forge/20260529_105205_1305917_9a1b32f3

…ted_names Add test verifying that assignment-based import aliasing (e.g. `m = jumpstarter.common; m.Metadata`) is a known untracked pattern, documenting this as an accepted limitation given the codebase convention of using `from X import Y` style imports. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

…tials Cover all four public symbols in oci.py's __all__ with tests. TestReadAuthFileCredentials verifies auth file reading, malformed file handling, and registry mismatch. TestResolveOciCredentials verifies the three-level credential precedence (explicit, env vars, auth file) and partial credential rejection. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Add hypothesis tests verifying that label selectors (key=value) satisfy expression-based requirements: exists, !=, and notin operators. This complements the existing in-operator cross-type test to cover all branches in _label_satisfies_expression. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

Include jumpstarter.config in the tracked packages list so its submodules are tested for importability and any __all__-defining modules would be discovered by TestModulesWithAllDiscovery. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

…xists Adds _collect_assignment_aliases helper and three tests that verify: 1. The helper detects assignment-based aliasing patterns 2. The helper ignores untracked modules 3. No production files use assignment-based aliasing of tracked modules This converts the _collect_imported_names_from_tree limitation from a theoretical concern into a provably non-impactful one, as the codebase exclusively uses "from X import Y" style imports. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Require explicit h/m/s suffixes in the proper order and reject empty input, preventing bare numbers or malformed durations from being silently accepted. Generated-By: Forge/20260601_123354_597814_b463be94

These are fuzz runner dependencies used by scripts/fuzz.py, not package-level test dependencies. Move them to the workspace root dev group and standardize individual packages to use plain hypothesis>=6.127.2. Generated-By: Forge/20260601_123354_597814_b463be94

Run extended fuzzing on a nightly schedule (2h) instead of on every push to main (6h), reducing CI resource usage while still catching regressions. Lower timeout-minutes from 370 to 150 accordingly. Generated-By: Forge/20260601_123354_597814_b463be94

Verify that _insert_example preserves existing comments and whitespace during source file modification, and that unsafe AST nodes are rejected. Generated-By: Forge/20260601_123354_597814_b463be94

Remove hypothesis from packages that have no hypothesis_test.py or robustness_test.py files, reducing unnecessary dependency sprawl. Generated-By: Forge/20260601_123354_597814_b463be94

… files These files had only trailing-comma and bracket-style changes with no hypothesis-related content, inflating the diff from ~97 meaningful files to 122 and deviating from the project TOML style. Generated-By: Forge/20260601_123354_597814_b463be94

The notin and != operators now return True when the key is absent from labels, matching Kubernetes behavior where a missing key vacuously satisfies negative constraints. The !exists and exists operators now check both matchLabels and matchExpressions to correctly handle cases where a key appears only in expressions. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add constructor robustness test entries for the 23 previously missing driver packages including androidemulator, can, composite, corellium, doip, dut_network, flashers, gpiod, http_power, mitmproxy, network, noyito_relay, opendal, power, qemu, ridesx, someip, uboot, uds, uds_can, uds_doip, ustreamer, and xcp. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…den falsifying examples regex parse_duration now rejects inputs like "5m30" where a bare number follows a unit suffix, preventing the ambiguous interpretation as 330 seconds. The _extract_falsifying_examples regex now allows trailing whitespace after the closing paren, and skips examples with empty cleaned args. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ly returns Use early returns to prevent overlapping assertion conditions in FuzzValidateLeaseTags. Keys with jumpstarter.dev/ also contain /, so checking the more specific prefix first and returning prevents the generic slash check from masking which rule triggered rejection. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apply ruff import ordering fixes to move third-party imports after relative imports in 4 robustness test files. Sync uv.lock to reflect workspace dev dependency changes from prior commit. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three CRD test files import jsonschema at module level but the package was not declared in dev dependencies, causing ModuleNotFoundError during pytest collection and blocking the entire test suite. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This test imports from jumpstarter_cli.common and jumpstarter_cli_common.opt but was placed in the core jumpstarter package which does not depend on CLI packages. This caused import collection errors during pytest. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Centralizing these settings in the ci and fuzz profiles eliminates the need for per-test @settings overrides and prevents flaky CI failures on slow runners. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

With deadline=None and suppress_health_check set in the ci and fuzz Hypothesis profiles, per-test @settings(deadline=None) decorators are redundant. Removing them reduces boilerplate and ensures consistency. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All driver and CLI robustness tests duplicated an identical ARBITRARY strategy definition. Replaced with import from testing_strategies.py to prevent drift. Three files with domain-specific customizations (crd_robustness_test, serde_robustness_test, kubernetes) are left as-is since their constraints differ intentionally. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…g semantics The function performs Kubernetes label satisfaction checks where negative operators (!=, notin, !exists) are satisfied when the key is absent, but the docstring incorrectly described it as a containment check. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When --go-only is used and no Go fuzz targets exist, the division by len(targets) would crash. Add an early return with a message instead. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ching Remove re.MULTILINE from the example regex so that closing-paren matches require a true newline or end-of-string rather than any end-of-line anchor, preventing premature matches on embedded closing parens in multi-line falsifying examples. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The hardcoded parents[4] path traversal breaks when the file moves to a different directory depth. Walk upward to find the repository root by looking for both controller/ and python/ directories instead. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

L1: Improve parse_duration error message to suggest explicit units. L2: Extract startup_grace into HYPOFUZZ_STARTUP_GRACE_SECONDS constant. L4: Use word-boundary regex for hypothesis example import detection. L6: Tighten CI fuzz_time validation regex to require at least one component. L9: Raise jsonschema lower bound from 4.0.0 to 4.17.0. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add push trigger for main branch with 6h fuzz budget per FR-008. Compute timeout-minutes dynamically from fuzz_time instead of hardcoding 150 minutes, so workflow_dispatch with long durations will not be killed prematurely. Generated-By: Forge/20260601_152609_887943_15050c4c

Add explicit tests verifying that a selector without key K satisfies !exists, notin, and != requirements for that key, matching Kubernetes label selector semantics for negative operators on absent labels. Generated-By: Forge/20260601_152609_887943_15050c4c

Add CLI flag to override the default limit of 1 regression example per test function. The default preserves the existing conservative behavior where only the first failure is kept per test, but users can now raise the limit when investigating multiple distinct failure modes. Generated-By: Forge/20260601_152609_887943_15050c4c

…lity click.exceptions.BadParameter etc. are not resolvable by ty as submodule attributes; use click.BadParameter which is the public API. Also wrap a long line in selectors_hypothesis_test.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously run_go_all and the main dispatch loop would bail out on the first Go fuzzer crash, skipping all remaining targets. Now all targets run to completion and failures are reported at the end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Same fix as the previous commit, applied to cli-admin, cli-common, and cli-driver packages. Also cast arbitrary-typed args to Any to satisfy ty's invalid-argument-type checks in cli-common robustness tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Satisfies staticcheck QF1001. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Robustness tests intentionally pass object-typed values to functions with specific type signatures. Wrap these calls with cast(Any, ...) so the ty type checker does not report invalid-argument-type errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

raballew and others added 30 commits May 29, 2026 11:01

test: add hypothesis property tests for OciCredentials validation

9045c2b

Property-based tests verify both-or-neither enforcement, whitespace normalization, dict roundtrip, and frozen model hashability for OciCredentials. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

test: add hypothesis property tests for Metadata construction

19e26a5

Property-based tests verify labels preservation, UUID validity, UUID uniqueness, and the name property lookup behavior for arbitrary label dictionaries. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

test: add hypothesis property tests for enum roundtrips

f5a44b8

Property-based tests verify from_proto/to_proto roundtrip, integer return type, string representation, and value uniqueness for both ExporterStatus and LogSource enums. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

fix: resolve ruff lint issues in API surface audit test

e92fb5f

Fix import ordering and rename unused loop variables to follow ruff B007 and I001 rules. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

refactor: move api_surface_test.py to package root

f5d0304

The test validates exports across jumpstarter.client, jumpstarter.driver, jumpstarter.exporter, and jumpstarter.common, so it belongs at the package root rather than inside common/. Generated-By: Forge/20260529_105205_1305917_9a1b32f3

test: add hypothesis fuzz tests for gRPC protobuf message serialization

e99e32a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: add hypothesis fuzz tests for CRD schema validation

9f54bd7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: add hypothesis fuzz tests for CLI argument parsing

086bb09

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

raballew and others added 25 commits June 1, 2026 12:55

fix: tighten CI fuzz duration validation regex

1467510

Require explicit h/m/s suffixes in the proper order and reject empty input, preventing bare numbers or malformed durations from being silently accepted. Generated-By: Forge/20260601_123354_597814_b463be94

test: add tests for example injection safety and comment preservation

e380c50

Verify that _insert_example preserves existing comments and whitespace during source file modification, and that unsafe AST nodes are rejected. Generated-By: Forge/20260601_123354_597814_b463be94

chore: remove unused hypothesis dev dependency from 26 packages

a70f034

Remove hypothesis from packages that have no hypothesis_test.py or robustness_test.py files, reducing unnecessary dependency sprawl. Generated-By: Forge/20260601_123354_597814_b463be94

raballew mentioned this pull request Jun 1, 2026

ParseLabelSelector round-trip instability with duplicate NotEquals values #728

Open

raballew and others added 4 commits June 1, 2026 21:20

fix: apply De Morgan's law in metadata fuzz test hex check

d9086a3

Satisfies staticcheck QF1001. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add fuzz testing infrastructure#720

test: add fuzz testing infrastructure#720
raballew wants to merge 116 commits into
jumpstarter-dev:mainfrom
raballew:512-hypothesis-fuzz-testing

raballew commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raballew commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Fuzz runner (scripts/fuzz.py)

CI workflow (.github/workflows/fuzz.yaml)

Python test coverage (73 test files)

Go fuzz tests (7 targets)

Bug fix

Findings

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raballew commented Jun 1, 2026 •

edited

Loading

Fuzz runner (`scripts/fuzz.py`)

CI workflow (`.github/workflows/fuzz.yaml`)