test: add fuzz testing infrastructure#720
Draft
raballew wants to merge 116 commits into
Draft
Conversation
Add hypothesis as dev dependency and create property-based tests for TLSConfigV1Alpha1 roundtrip and HookInstanceConfigV1Alpha1 construction with arbitrary valid inputs. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify both-or-neither enforcement, whitespace normalization, dict roundtrip, and frozen model hashability for OciCredentials. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify labels preservation, UUID validity, UUID uniqueness, and the name property lookup behavior for arbitrary label dictionaries. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify parse roundtrip, selector_contains reflexivity, superset-subset containment, empty requirement matching, disjoint label rejection, and in-expression parsing. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify non-matching data returns None, signature prefix detection for all formats, real compressed data detection, and compress/decompress roundtrip for gzip, xz, bz2, and zstd. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Property-based tests verify from_proto/to_proto roundtrip, integer return type, string representation, and value uniqueness for both ExporterStatus and LogSource enums. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Verifies that all modules with __all__ are importable, exports match expected symbols, all declared exports are resolvable, and all submodules in common and config packages are importable. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Fix import ordering and rename unused loop variables to follow ruff B007 and I001 rules. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Replace `if leaks: pass` no-op with an actual assertion that catches undeclared public symbols. Use module-aware filtering to distinguish locally-defined names from imported ones, with per-module allowlists for intentionally public functions not in __all__. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
The test validates exports across jumpstarter.client, jumpstarter.driver, jumpstarter.exporter, and jumpstarter.common, so it belongs at the package root rather than inside common/. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
TestCompressionRoundtrip only tested gzip/bz2/lzma/zstd stdlib functions without exercising project code. Replaced with TestCreateDecompressorRoundtrip that verifies the project's create_decompressor function produces working decompressor objects for all supported compression formats. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Scan all packages for import sites of each exported symbol and flag symbols with zero external imports. Symbols intentionally re-exported for external consumers are tracked in an allowlist. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
- F005: add discovery test for modules defining __all__ not tracked
- F006: split models_hypothesis_test.py into tls_hypothesis_test.py
and exporter_hypothesis_test.py to match {module}_test.py convention
- F010: add tests for !=, notin, exists, !exists selector operators
- F011: replace hypothesis with pytest.mark.parametrize for finite enums
- F012: replace tautological test_value_equals_proto with assertions
against actual protobuf constants from common_pb2
- F013: add return type annotation to label_pairs_strategy
- F014: add type annotations to module-level strategy constants
- F015: use Literal type for on_failure parameter
- F017: add tests for extract_match_labels_filter function
- F018: assert hash consistency in test_frozen_model_is_hashable
Generated-By: Forge/20260529_105205_1305917_9a1b32f3
The function only handled ast.ImportFrom nodes, missing direct import statements (e.g., import X.Y.Z). Now also tracks attribute access on directly imported modules to avoid false positives in the zero-usage report. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The startswith(module_name) check would incorrectly classify symbols from jumpstarter.commonx as local to jumpstarter.common. Use exact match or dot-suffixed prefix check instead. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both selectors_hypothesis_test and metadata_hypothesis_test defined independent label key/value strategies with different constraints. Consolidate into a single testing_strategies module to ensure consistent generation across test files. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add TestParseOciRegistry class with hypothesis-driven tests covering explicit registry extraction, oci:// scheme stripping, port preservation, and idempotency between plain and prefixed URLs. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add hypothesis-driven tests for selector_contains with in, notin, exists, and !exists expressions, verifying reflexivity and operator mismatch behavior. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply ruff isort fixes to selectors_hypothesis_test.py and metadata_hypothesis_test.py after shared strategy extraction. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract _collect_imported_names_from_tree helper for testability and add tests verifying that dotted imports, aliased imports, and from-imports all resolve correctly through attribute chain walking. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…ents A key=value label selector now correctly satisfies expression-based requirements like key in (value), key!=other, key notin (other), and key exists. Extracted _label_satisfies_expression to keep complexity within ruff C901 limits. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…CKAGES Replace hardcoded common and config submodule importability tests with a single parametrized test covering all four tracked packages (common, client, driver, exporter). Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…ted_names Add test verifying that assignment-based import aliasing (e.g. `m = jumpstarter.common; m.Metadata`) is a known untracked pattern, documenting this as an accepted limitation given the codebase convention of using `from X import Y` style imports. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…tials Cover all four public symbols in oci.py's __all__ with tests. TestReadAuthFileCredentials verifies auth file reading, malformed file handling, and registry mismatch. TestResolveOciCredentials verifies the three-level credential precedence (explicit, env vars, auth file) and partial credential rejection. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Add hypothesis tests verifying that label selectors (key=value) satisfy expression-based requirements: exists, !=, and notin operators. This complements the existing in-operator cross-type test to cover all branches in _label_satisfies_expression. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
Include jumpstarter.config in the tracked packages list so its submodules are tested for importability and any __all__-defining modules would be discovered by TestModulesWithAllDiscovery. Generated-By: Forge/20260529_105205_1305917_9a1b32f3
…xists Adds _collect_assignment_aliases helper and three tests that verify: 1. The helper detects assignment-based aliasing patterns 2. The helper ignores untracked modules 3. No production files use assignment-based aliasing of tracked modules This converts the _collect_imported_names_from_tree limitation from a theoretical concern into a provably non-impactful one, as the codebase exclusively uses "from X import Y" style imports. Generated-By: Forge/20260529_105205_1305917_9a1b32f3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Require explicit h/m/s suffixes in the proper order and reject empty input, preventing bare numbers or malformed durations from being silently accepted. Generated-By: Forge/20260601_123354_597814_b463be94
These are fuzz runner dependencies used by scripts/fuzz.py, not package-level test dependencies. Move them to the workspace root dev group and standardize individual packages to use plain hypothesis>=6.127.2. Generated-By: Forge/20260601_123354_597814_b463be94
Run extended fuzzing on a nightly schedule (2h) instead of on every push to main (6h), reducing CI resource usage while still catching regressions. Lower timeout-minutes from 370 to 150 accordingly. Generated-By: Forge/20260601_123354_597814_b463be94
Verify that _insert_example preserves existing comments and whitespace during source file modification, and that unsafe AST nodes are rejected. Generated-By: Forge/20260601_123354_597814_b463be94
Remove hypothesis from packages that have no hypothesis_test.py or robustness_test.py files, reducing unnecessary dependency sprawl. Generated-By: Forge/20260601_123354_597814_b463be94
… files These files had only trailing-comma and bracket-style changes with no hypothesis-related content, inflating the diff from ~97 meaningful files to 122 and deviating from the project TOML style. Generated-By: Forge/20260601_123354_597814_b463be94
The notin and != operators now return True when the key is absent from labels, matching Kubernetes behavior where a missing key vacuously satisfies negative constraints. The !exists and exists operators now check both matchLabels and matchExpressions to correctly handle cases where a key appears only in expressions. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add constructor robustness test entries for the 23 previously missing driver packages including androidemulator, can, composite, corellium, doip, dut_network, flashers, gpiod, http_power, mitmproxy, network, noyito_relay, opendal, power, qemu, ridesx, someip, uboot, uds, uds_can, uds_doip, ustreamer, and xcp. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…den falsifying examples regex parse_duration now rejects inputs like "5m30" where a bare number follows a unit suffix, preventing the ambiguous interpretation as 330 seconds. The _extract_falsifying_examples regex now allows trailing whitespace after the closing paren, and skips examples with empty cleaned args. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ly returns Use early returns to prevent overlapping assertion conditions in FuzzValidateLeaseTags. Keys with jumpstarter.dev/ also contain /, so checking the more specific prefix first and returning prevents the generic slash check from masking which rule triggered rejection. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply ruff import ordering fixes to move third-party imports after relative imports in 4 robustness test files. Sync uv.lock to reflect workspace dev dependency changes from prior commit. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three CRD test files import jsonschema at module level but the package was not declared in dev dependencies, causing ModuleNotFoundError during pytest collection and blocking the entire test suite. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This test imports from jumpstarter_cli.common and jumpstarter_cli_common.opt but was placed in the core jumpstarter package which does not depend on CLI packages. This caused import collection errors during pytest. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Centralizing these settings in the ci and fuzz profiles eliminates the need for per-test @settings overrides and prevents flaky CI failures on slow runners. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With deadline=None and suppress_health_check set in the ci and fuzz Hypothesis profiles, per-test @settings(deadline=None) decorators are redundant. Removing them reduces boilerplate and ensures consistency. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All driver and CLI robustness tests duplicated an identical ARBITRARY strategy definition. Replaced with import from testing_strategies.py to prevent drift. Three files with domain-specific customizations (crd_robustness_test, serde_robustness_test, kubernetes) are left as-is since their constraints differ intentionally. Generated-By: Forge/20260601_142034_798905_c7a34b73 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g semantics The function performs Kubernetes label satisfaction checks where negative operators (!=, notin, !exists) are satisfied when the key is absent, but the docstring incorrectly described it as a containment check. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --go-only is used and no Go fuzz targets exist, the division by len(targets) would crash. Add an early return with a message instead. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ching Remove re.MULTILINE from the example regex so that closing-paren matches require a true newline or end-of-string rather than any end-of-line anchor, preventing premature matches on embedded closing parens in multi-line falsifying examples. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hardcoded parents[4] path traversal breaks when the file moves to a different directory depth. Walk upward to find the repository root by looking for both controller/ and python/ directories instead. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
L1: Improve parse_duration error message to suggest explicit units. L2: Extract startup_grace into HYPOFUZZ_STARTUP_GRACE_SECONDS constant. L4: Use word-boundary regex for hypothesis example import detection. L6: Tighten CI fuzz_time validation regex to require at least one component. L9: Raise jsonschema lower bound from 4.0.0 to 4.17.0. Generated-By: Forge/20260601_152609_887943_15050c4c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add push trigger for main branch with 6h fuzz budget per FR-008. Compute timeout-minutes dynamically from fuzz_time instead of hardcoding 150 minutes, so workflow_dispatch with long durations will not be killed prematurely. Generated-By: Forge/20260601_152609_887943_15050c4c
Add explicit tests verifying that a selector without key K satisfies !exists, notin, and != requirements for that key, matching Kubernetes label selector semantics for negative operators on absent labels. Generated-By: Forge/20260601_152609_887943_15050c4c
Add CLI flag to override the default limit of 1 regression example per test function. The default preserves the existing conservative behavior where only the first failure is kept per test, but users can now raise the limit when investigating multiple distinct failure modes. Generated-By: Forge/20260601_152609_887943_15050c4c
…lity click.exceptions.BadParameter etc. are not resolvable by ty as submodule attributes; use click.BadParameter which is the public API. Also wrap a long line in selectors_hypothesis_test.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously run_go_all and the main dispatch loop would bail out on the first Go fuzzer crash, skipping all remaining targets. Now all targets run to completion and failures are reported at the end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same fix as the previous commit, applied to cli-admin, cli-common, and cli-driver packages. Also cast arbitrary-typed args to Any to satisfy ty's invalid-argument-type checks in cli-common robustness tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Satisfies staticcheck QF1001. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Robustness tests intentionally pass object-typed values to functions with specific type signatures. Wrap these calls with cast(Any, ...) so the ty type checker does not report invalid-argument-type errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds property-based and robustness fuzz testing across the entire jumpstarter codebase using Hypothesis (Python) and Go's native fuzzing. Includes a unified fuzz runner, CI workflow, and regression injection pipeline.
Closes #512
What's included
Fuzz runner (
scripts/fuzz.py)go test -fuzz) targets within a configurable time budget@example()decorators (Python) orf.Add()seed corpus entries (Go) in committed test sourceCI workflow (
.github/workflows/fuzz.yaml)Python test coverage (73 test files)
*_hypothesis_test.py): label selector parsing, OCI credentials, TLS config, CRD schema validation, gRPC protobuf serialization, serde roundtrips, stream encoding, driver decorators, enum roundtrips, condition handling*_robustness_test.py): every driver package, CLI commands (create/delete/get/update/shell/run/login/auth/config/completion), Kubernetes models, config parsing, protocol layerGo fuzz tests (7 targets)
FuzzParseLabelSelector,FuzzReconcileLeaseTimeFields,FuzzValidateLeaseTagsFuzzNormalizeOIDCUsername,FuzzBearerTokenExtractionFuzzMatchLabels,FuzzLoadGrpcConfigurationBug fix
selector_containswas matching labels against requirements incorrectly (fixed inselectors.py)Findings
A 48-hour local fuzz run found 3 bugs, filed as:
DurationParamType.convertraisesOverflowErroron large numeric strings_label_satisfies_expressionsilently returnsFalsefor unknown operatorsV1Alpha1Lease.from_dictraisesAttributeErrorwhen spec is not a dictTest plan
python -m pytest scripts/fuzz_test.py-- fuzz runner unit testsmake fuzz-python FUZZ_TIME=5m-- quick Python fuzz smoke testmake fuzz FUZZ_TIME=5m-- full fuzz suite (Python + Go)@example()injection works: run fuzz, checkgit difffor injected decorators🤖 Generated with Claude Code