Skip to content

AFLplusplus/fuzz-reachability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Static Fuzz-Reachability Analyzer (C / C++ / Rust)

Given a project's build, this tool computes the set of functions a fuzz entry point (LLVMFuzzerTestOneInput, a Rust cargo-fuzz target, or any entry you name) can statically reach. It works uniformly across C, C++, and Rust — including mixed-language projects — by analyzing merged LLVM bitcode.

The result is a sound-leaning over-approximation: it answers which functions can be reached, not which ones ran. No function that is actually reachable is ever reported unreachable. Over-reporting is expected and safe; under-reporting is a bug.

Feed the output to AFL++ or clang's SanitizerCoverage allow/ignore lists to instrument only reachable code — cheaper, more focused fuzzing:

  • AFL++: export AFL_LLVM_ALLOWLIST=$(pwd)/reached.txt -or- export AFL_LLVM_DENYLIST=$(pwd)/not_reached.txt
  • sancov based fuzzers (libfuzzer, honggfuzz, libafl, AFL++): -fsanitize-coverage-allowlist=$(pwd)/reached.txt -or- -fsanitize-coverage-ignorelist=$(pwd)/not_reached.txt

Recommendation: use the allow feature with reached.txt rather than the deny/ignore feature.

Additionally you can feed the output files into cov-analysis - the state-of-the-art coverage analysis tooling.

Deep dives:

  • Worked examples, step by step — a generic LLVMFuzzerTestOneInput harness for AFL++/libfuzzer (libxml2), a ziggy harness (the url crate), and cargo-afl harnesses (cpp_demangle and rustyknife) — docs/EXAMPLES.md
  • LLVM version support — docs/llvm-support.md

Author: Marc "vanHauser" Heuse

License: GNU AGPL v3 or newer

How it works

 driver (Python)              analyzer (C++ / LLVM)
 ───────────────              ─────────────────────
   acquire bitcode ─┐
   C/C++ : gllvm    │   llvm-link    load .bc → build call graph →
   Rust  : rustc    ├─► merge .bc ─► resolve indirect calls → BFS from
   --emit=llvm-bc  ─┘                entry → JSON report + sancov lists

Two components, joined by merged bitcode:

  • Driver (Python) — acquires bitcode per language, merges it with llvm-link, verifies the LLVM toolchain is version-coherent, and runs the analyzer.
  • Analyzer (C++ linking LLVM) — loads the merged .bc, builds the call graph, resolves indirect calls (C function pointers, C++ virtual dispatch, Rust dyn/fn pointers), treats function pointers that escape to code outside the bitcode (handed to an external/indirect call or returned — e.g. qsort/bsearch comparators, atexit/pthread/std::call_once callbacks) as reachable, computes reachability from the entry, and emits a JSON report plus the two sancov lists. It demangles C++ (Itanium) and Rust names.

Prerequisites

  • LLVM ≥ 21. One coherent toolchain: clang, clang++, llvm-link, opt, and the analyzer all share one major M ≥ 21, and rustc's LLVM is no newer than M. See docs/llvm-support.md. NOTE! especially as a Rust user, we recommend to install LLVM via https://apt.llvm.org/llvm.sh instead of the distribution, as those will be outdated!
  • Go (to install gllvm), Python ≥ 3.12, and a C++17 compiler. Rust targets also need rustc / cargo (nightly, but one using LLVM 21 or prior).
  • AFL++ compiled from commit 01a83a3d7098e605f0c7fd69381fcf4fc97144fe onwards (24 June 2026)
  • cov-analysis from commit 72c239038430477181df99f7a2cd0a556f2701dd onwards (23 June 2026)

Install

The analyzer builds with a plain Makefile driven by llvm-config — no CMake.

bash scripts/setup.sh        # gllvm + rust-src, create .venv, build the analyzer

Or piecemeal:

make venv                    # create .venv (driver, editable + pytest)
make build                   # build the analyzer on the auto-selected LLVM (≥ 21)
make build LLVM_MAJOR=23     # ...or pin a specific major
make test                    # run the full test suite
make matrix                  # build + test against every installed LLVM ≥ 21
make help                    # list all targets

To run the CLI, point it at the built analyzer and put gllvm on PATH:

export REACHABILITY_ANALYZER=$PWD/analyzer/build/reachability-analyzer
export PATH="$(go env GOPATH)/bin:$PATH"     # gclang / gclang++ / get-bc
source .venv/bin/activate                    # or call .venv/bin/reachability directly
reachability check-toolchain                 # verify LLVM version coherence first

Quick start

reachability run --lang <target> --project <dir> [--out <file>]

--out is optional; it defaults to reachability.json in the --project directory. If --out points at an existing directory, the report is written to reachability.json inside it.

<target> is a source language (c, cpp, rust, mixed) or a Rust fuzz harness (libfuzzer, ziggy, afl). Each sets a default entry point, so the common case needs no --entry. The build command and the artifact are auto-detected for C/C++; override them with --build-cmd / --artifact when needed.

Full options: Command-line reference.

Examples

Read about real-world target examples in docs/EXAMPLES.md

A C target

fixtures/c_direct is a small C fuzz target. Its build and artifact are auto-detected:

reachability run --lang c --project fixtures/c_direct --out c.json -v
reachable 3 / defined 4  (0 indirect-only, 1 unreachable)  [backend=type-based]

LLVMFuzzerTestOneInput → used_a → used_b are reachable; dead_fn lands in unreachable_defined.

A C++ target (CMake)

examples/cpp_cmake uses virtual dispatch. The driver detects the CMake build, wraps it with gllvm, and analyzes the resulting executable:

reachability run --lang cpp --project examples/cpp_cmake --out cpp.json -v

The virtual call Codec::decode over-approximates to both overrides, reached via indirect edges:

  Raw::decode(unsigned char const*, unsigned long) | via indirect
  Xor::decode(unsigned char const*, unsigned long) | via indirect

A Rust target

fixtures/rust_dyn is a Rust staticlib whose LLVMFuzzerTestOneInput dispatches through a dyn Trait. The driver builds it with RUSTFLAGS="--emit=llvm-bc …" and collects the per-crate bitcode:

reachability run --lang rust --project fixtures/rust_dyn --out rust.json -v

The trait-object call resolves to both implementations, via indirect edges:

  <rust_dyn::Inc as rust_dyn::Op>::run | via indirect
  <rust_dyn::Dbl as rust_dyn::Op>::run | via indirect

A mixed C + Rust target (cargo-fuzz shape)

fixtures/mixed_c_rust has C++ glue calling an extern "C" Rust entry. Use --lang mixed; the driver builds and merges both sides' bitcode (gllvm for the glue, cargo for Rust), and the cross-language edge resolves by C-ABI symbol name:

reachability run --lang mixed --project fixtures/mixed_c_rust \
  --artifact glue.o --out mixed.json -v

Point --artifact at the C/C++ object so it is picked out from the Rust build outputs.

A target that links a static library

A tool linked against a static library (say tools/thumbnail linking libtiff.a) embeds only the archive members the linker actually pulled in. To analyze the whole library — not just the slice the linker kept — point --artifact at the linked binary and keep the default --static-libs auto:

reachability run --lang c --project tiff-4.0.4 --artifact tools/thumbnail \
  --out tiff.json -v

The driver merges thumbnail's own objects with the full contents of libtiff.a. Functions in members the linker discarded (e.g. TIFFReadRGBAImage, TIFFPrintDirectory) now show up as unreachable instead of vanishing, while the reachable set is unchanged from the linker's view — adding the rest of the library can only add unreachable functions, never remove reachable ones. Use --static-libs none for the linker's view only, or all to include every bitcode archive in the tree.

A ziggy harness

A ziggy harness is a Rust binary whose fuzz loop lives in main rather than in LLVMFuzzerTestOneInput. --lang ziggy acquires the bitcode and roots at main automatically:

reachability run --lang ziggy --project <harness> --out z.json

For complete, start-to-finish walkthroughs on real targets — ziggy (the url crate), cargo-afl (cpp_demangle and rustyknife), and libFuzzer (libxml2) harnesses — see docs/EXAMPLES.md.

Command-line reference

The reachability CLI has two subcommands.

reachability check-toolchain

Resolves and validates the LLVM toolchain (analyzer, clang/clang++, llvm-link, opt, rustc) for version coherence and prints what it found. Run it first; it exits non-zero on any incoherence. See docs/llvm-support.md for the policy.

reachability run

Builds a project, merges its bitcode, and computes the reachable set from the entry point(s).

Option Default Meaning
--project DIR (required) Project directory to build and analyze.
--lang TARGET (required) Target type (see the table below): sets how bitcode is acquired and the default entry.
--out FILE reachability.json in --project Path for the JSON report. A directory writes reachability.json into it. The two sancov lists default to reached.txt / not_reached.txt beside it.
--entry NAME per --lang Entry to root reachability at. Repeatable; overrides the target default. See Entry resolution.
--backend NAME (none) Deprecated and ignored; the type-based backend is always used. Accepted for backward compatibility — passing it prints a warning.
--artifact PATH auto-detect C/C++ only: the built binary/object/archive to extract bitcode from (relative to --project). Auto-detected otherwise, preferring an executable over a shared library, archive, then object.
--build-cmd CMD auto-detect C/C++ only: shell build command, run with gllvm injected. E.g. "cmake -S . -B build && cmake --build build". Auto-detected from the project files otherwise (configureMakefileCMakeLists.txtbuild.ninjameson.build, else make).
--static-libs {auto,none,all} auto C/C++ only: how to treat static archives (.a) the target links. auto also analyzes each linked archive in full, so members the linker dropped are reported rather than silently absent. none keeps only the linker's view. all includes every bitcode archive in the tree, skipping any whose members another archive already covers and resolving residual overlaps at link time (llvm-link --override).
--profile {debug,release} debug Rust only: cargo profile for the bitcode build. Match the fuzz binary's profile. See Matching the fuzz binary's build.
--codegen-units N 1 Rust only: rustc -Ccodegen-units for the bitcode build (positive integer). Match the fuzz binary's value. See Matching the fuzz binary's build.
--build-std off Rust only: build the standard library from source (-Zbuild-std) so std functions appear in the graph instead of as external declarations.
--dot FILE (none) Also write the reachable subgraph as Graphviz DOT (indirect edges dashed/red).
--reached FILE beside --out Path for the sancov allowlist of reachable functions.
--not-reached FILE beside --out Path for the sancov ignorelist of unreachable functions.
-v, --verbose off Narrate each pipeline stage (toolchain → build → merge → analyze): echoes the tool commands run, streams the build output live, and lists the collected bitcode modules.

Target types (--lang)

--lang acquires via default entry
c gllvm (gclang) main + LLVMFuzzerTestOneInput
cpp gllvm (gclang++) main + LLVMFuzzerTestOneInput
rust cargo + --emit=llvm-bc main
mixed gllvm and cargo (merged) LLVMFuzzerTestOneInput
libfuzzer cargo (Rust) fuzz_target!
ziggy cargo (Rust) main
afl cargo (Rust) main

The C/C++ targets root at both main and LLVMFuzzerTestOneInput, so one --lang c/cpp covers a normal program and an LLVMFUzzerTestOneInput harness alike. A default entry that matches nothing is a harmless warning, because roots are unioned.

Entry resolution

--entry never requires a mangled symbol. A token matches — unioned across all of:

  • an exact mangled symbol (e.g. _Z3foov),
  • an exact demangled name (e.g. foo()),
  • a demangled ::name suffix (so main finds crate::main), and
  • the alias fuzz_target! (→ LLVMFuzzerTestOneInput + rust_fuzzer_test_input).

Matching more than one function only adds roots, which stays sound. For a Rust binary, just root at main: the token matches the real Rust main, so you never need to type a mangled symbol.

Matching the fuzz binary's build

For Rust targets the driver builds its own bitcode (cargo build --emit=llvm-bc) and computes reachability from that. For the resulting reached.txt / not_reached.txt to line up with the binary you actually instrument, that bitcode build should match the fuzz binary's build. Two Rust-only options control this; both default to the most common fuzzer setup and are ignored for C/C++.

  • --profile {debug,release} (default debug) — the cargo profile. The optimization level drives generic sharing (rustc's -Zshare-generics is on when unoptimized, off when optimized): it decides which crate instantiates each generic, and so which monomorphizations exist and how they are mangled. A debug snapshot against a release fuzz binary (or vice versa) therefore produces a different function set. Pass --profile release for an optimized fuzz build.

  • --codegen-units N (default 1) — passed through verbatim as rustc -Ccodegen-units. The unit count sets inlining boundaries, hence which monomorphizations survive as standalone functions rather than being inlined away. N is any positive integer (rustc rejects 0/negative). Useful values:

    • 1 — a single unit per crate: maximum inlining and exactly one .bc per crate. Many fuzzing profiles pin codegen-units = 1 for better optimization, so the default already matches them.
    • 16 — the rustc default for a cargo release build (incremental off).
    • 256 — the rustc default for a cargo dev/debug build (incremental on).

    With N > 1 rustc splits each crate into several target/<profile>/deps/<crate>-<hash>.<cgu>.rcgu.bc files; the driver collects all of them.

How to choose. Use whatever your fuzz build uses. That is the cargo/rustc default for its profile (release → 16, dev → 256) unless a [profile.*] codegen-units in Cargo.toml or a -Ccodegen-units in RUSTFLAGS overrides it. If unsure, build the fuzz target with codegen-units = 1 and keep the defaults here — the two then agree.

The fun: patterns in the lists already tolerate the Rust mangling disambiguator (17h<hash>) drifting between builds (see Output), but that only covers the naming of a given instance. Matching --profile / --codegen-units is what aligns the set of emitted functions; a wildcard cannot recover a function that one build inlined away and the other did not.

Environment variables

Variable Purpose
REACHABILITY_ANALYZER Path to the analyzer binary (default analyzer/build/reachability-analyzer).
CLANG / CLANGXX / LLVM_LINK / OPT Override individual tool paths (otherwise resolved by major from the analyzer's LLVM).
PATH Must contain gclang / gclang++ / get-bc (gllvm) for C/C++/mixed targets.

Output

reachability run writes three files:

  • <out>.jsonsummary counts, a reachable array (mangled and demangled name, source file/line when debug info is present, via = direct/indirect/both, and an indirect_only flag), and an unreachable_defined array. With --dot FILE, also the reachable subgraph.
  • reached.txt — a SanitizerCoverage allowlist of reachable functions.
  • not_reached.txt — a SanitizerCoverage ignorelist of unreachable functions.

Both lists use each function's mangled (LLVM symbol) name — what clang and AFL++ match fun: against — so they cover C, C++, and Rust. Feed either to clang or AFL++ to instrument only reachable code:

# instrument ONLY reachable functions:
clang -fsanitize-coverage=trace-pc-guard -fsanitize-coverage-allowlist=reached.txt ...
# OR: instrument everything EXCEPT unreachable functions:
clang -fsanitize-coverage=trace-pc-guard -fsanitize-coverage-ignorelist=not_reached.txt ...

A coverage allowlist instruments a function only when both a src: and a fun: entry match, so reached.txt opens with a src:* line. An ignorelist has no such requirement, so not_reached.txt is pure fun: lines. (Verified against clang in driver/tests/test_covlists.py.)

Rust mangling disambiguator. A Rust generic instance is mangled with a trailing 17h<hash> disambiguator whose value depends on the build (opt level, codegen units, instantiating crate). The exact value differs between this bitcode snapshot and the instrumented fuzz binary, so an exact-name entry would miss. Each fun: entry therefore replaces that disambiguator with a * glob, which both clang sancov and AFL++ honour, so an entry matches the instance in any build. An ignorelist pattern that would also match a reachable instance is dropped, so excluding unreachable code never excludes reachable code. For best fidelity still build the snapshot with the same --profile and --codegen-units as the fuzz binary, so the set of emitted monomorphizations matches; the * only tolerates the disambiguator, not a different function set.

Indirect-call resolution

Indirect calls (C function pointers, C++ virtual dispatch, Rust dyn/fn pointers) are resolved by the type-based resolver: an indirect call of function type T may reach any address-taken function whose LLVM function type is T. It is language-agnostic, always available, and sound — a deliberate over-approximation. The --indirect-any debug flag widens this further, to any address-taken function regardless of type.

See docs/llvm-support.md for the LLVM compatibility matrix.

Historical note: the removed SVF backend

Earlier versions shipped an optional second backend, --backend=svf, built on SVF's Andersen points-to analysis, meant to narrow the type-based over-approximation per call site. It was removed: in practice it produced essentially the same reachable sets as the default type-based backend while costing far more. It built only against a pinned LLVM 21 (it failed on 22/23), required a separately vendored SVF + Z3 build with a local source patch, ran substantially slower, and was more fragile to operate — so it offered no practical benefit over the type-based backend, which is sound, language-agnostic, and works on every supported LLVM. The --backend flag is retained only for backward compatibility: it is accepted but ignored (with a warning).

Testing

make test       # full pytest suite (analyzer .ll goldens + per-language soundness)
make matrix     # LLVM version-compatibility matrix (catches future-LLVM breakage)

Each fixture in fixtures/ carries a must_reach / must_not_reach set; every backend must satisfy the soundness invariant on each.

Project layout

analyzer/   C++ analyzer + Makefile (src/, built via llvm-config)
driver/     Python driver (toolchain, acquire_*, link, analyze, cli)
fixtures/   per-language test targets with expected reachable sets
examples/   worked examples (cpp_cmake/)
scripts/    setup.sh, test_matrix.sh, select_llvm.sh
docs/       worked examples (EXAMPLES.md), LLVM support

Limitations

This is a static over-approximation, not dynamic coverage. Its precision is bounded by indirect-call resolution and by any missing bitcode — precompiled libraries, or the Rust standard library without --build-std.

About

Function reachability analysis for harnesses für C/C++/Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors