Given a project's build, this tool computes the set of functions a fuzz entry
point (LLVMFuzzerTestOneInput, a Rust cargo-fuzz target, or any entry you name)
can statically reach. It works uniformly across C, C++, and Rust — including
mixed-language projects — by analyzing merged LLVM bitcode.
The result is a sound-leaning over-approximation: it answers which functions can be reached, not which ones ran. No function that is actually reachable is ever reported unreachable. Over-reporting is expected and safe; under-reporting is a bug.
Feed the output to AFL++ or clang's SanitizerCoverage allow/ignore lists to instrument only reachable code — cheaper, more focused fuzzing:
- AFL++:
export AFL_LLVM_ALLOWLIST=$(pwd)/reached.txt-or-export AFL_LLVM_DENYLIST=$(pwd)/not_reached.txt - sancov based fuzzers (libfuzzer, honggfuzz, libafl, AFL++):
-fsanitize-coverage-allowlist=$(pwd)/reached.txt-or--fsanitize-coverage-ignorelist=$(pwd)/not_reached.txt
Recommendation: use the allow feature with reached.txt rather than the deny/ignore feature.
Additionally you can feed the output files into cov-analysis - the state-of-the-art coverage analysis tooling.
Deep dives:
- Worked examples, step by step — a generic
LLVMFuzzerTestOneInputharness for AFL++/libfuzzer (libxml2), a ziggy harness (theurlcrate), and cargo-afl harnesses (cpp_demangle and rustyknife) —docs/EXAMPLES.md - LLVM version support —
docs/llvm-support.md
Author: Marc "vanHauser" Heuse
License: GNU AGPL v3 or newer
driver (Python) analyzer (C++ / LLVM)
─────────────── ─────────────────────
acquire bitcode ─┐
C/C++ : gllvm │ llvm-link load .bc → build call graph →
Rust : rustc ├─► merge .bc ─► resolve indirect calls → BFS from
--emit=llvm-bc ─┘ entry → JSON report + sancov lists
Two components, joined by merged bitcode:
- Driver (Python) — acquires bitcode per language, merges it with
llvm-link, verifies the LLVM toolchain is version-coherent, and runs the analyzer. - Analyzer (C++ linking LLVM) — loads the merged
.bc, builds the call graph, resolves indirect calls (C function pointers, C++ virtual dispatch, Rustdyn/fnpointers), treats function pointers that escape to code outside the bitcode (handed to an external/indirect call or returned — e.g. qsort/bsearch comparators, atexit/pthread/std::call_oncecallbacks) as reachable, computes reachability from the entry, and emits a JSON report plus the two sancov lists. It demangles C++ (Itanium) and Rust names.
- LLVM ≥ 21. One coherent toolchain:
clang,clang++,llvm-link,opt, and the analyzer all share one major M ≥ 21, and rustc's LLVM is no newer than M. Seedocs/llvm-support.md. NOTE! especially as a Rust user, we recommend to install LLVM via https://apt.llvm.org/llvm.sh instead of the distribution, as those will be outdated! - Go (to install
gllvm), Python ≥ 3.12, and a C++17 compiler. Rust targets also need rustc / cargo (nightly, but one using LLVM 21 or prior). - AFL++ compiled from commit 01a83a3d7098e605f0c7fd69381fcf4fc97144fe onwards (24 June 2026)
- cov-analysis from commit 72c239038430477181df99f7a2cd0a556f2701dd onwards (23 June 2026)
The analyzer builds with a plain Makefile driven by llvm-config — no CMake.
bash scripts/setup.sh # gllvm + rust-src, create .venv, build the analyzerOr piecemeal:
make venv # create .venv (driver, editable + pytest)
make build # build the analyzer on the auto-selected LLVM (≥ 21)
make build LLVM_MAJOR=23 # ...or pin a specific major
make test # run the full test suite
make matrix # build + test against every installed LLVM ≥ 21
make help # list all targetsTo run the CLI, point it at the built analyzer and put gllvm on PATH:
export REACHABILITY_ANALYZER=$PWD/analyzer/build/reachability-analyzer
export PATH="$(go env GOPATH)/bin:$PATH" # gclang / gclang++ / get-bc
source .venv/bin/activate # or call .venv/bin/reachability directly
reachability check-toolchain # verify LLVM version coherence firstreachability run --lang <target> --project <dir> [--out <file>]--out is optional; it defaults to reachability.json in the --project
directory. If --out points at an existing directory, the report is written
to reachability.json inside it.
<target> is a source language (c, cpp, rust, mixed) or a Rust fuzz
harness (libfuzzer, ziggy, afl). Each sets a default entry point, so the
common case needs no --entry. The build command and the artifact are
auto-detected for C/C++; override them with --build-cmd / --artifact when
needed.
Full options: Command-line reference.
Read about real-world target examples in docs/EXAMPLES.md
fixtures/c_direct is a small C fuzz target. Its build and artifact are
auto-detected:
reachability run --lang c --project fixtures/c_direct --out c.json -vreachable 3 / defined 4 (0 indirect-only, 1 unreachable) [backend=type-based]
LLVMFuzzerTestOneInput → used_a → used_b are reachable; dead_fn lands in
unreachable_defined.
examples/cpp_cmake uses virtual dispatch. The driver detects the CMake build,
wraps it with gllvm, and analyzes the resulting executable:
reachability run --lang cpp --project examples/cpp_cmake --out cpp.json -vThe virtual call Codec::decode over-approximates to both overrides, reached
via indirect edges:
Raw::decode(unsigned char const*, unsigned long) | via indirect
Xor::decode(unsigned char const*, unsigned long) | via indirect
fixtures/rust_dyn is a Rust staticlib whose LLVMFuzzerTestOneInput
dispatches through a dyn Trait. The driver builds it with
RUSTFLAGS="--emit=llvm-bc …" and collects the per-crate bitcode:
reachability run --lang rust --project fixtures/rust_dyn --out rust.json -vThe trait-object call resolves to both implementations, via indirect edges:
<rust_dyn::Inc as rust_dyn::Op>::run | via indirect
<rust_dyn::Dbl as rust_dyn::Op>::run | via indirect
fixtures/mixed_c_rust has C++ glue calling an extern "C" Rust entry. Use
--lang mixed; the driver builds and merges both sides' bitcode (gllvm for the
glue, cargo for Rust), and the cross-language edge resolves by C-ABI symbol name:
reachability run --lang mixed --project fixtures/mixed_c_rust \
--artifact glue.o --out mixed.json -vPoint --artifact at the C/C++ object so it is picked out from the Rust build
outputs.
A tool linked against a static library (say tools/thumbnail linking
libtiff.a) embeds only the archive members the linker actually pulled in. To
analyze the whole library — not just the slice the linker kept — point
--artifact at the linked binary and keep the default --static-libs auto:
reachability run --lang c --project tiff-4.0.4 --artifact tools/thumbnail \
--out tiff.json -vThe driver merges thumbnail's own objects with the full contents of
libtiff.a. Functions in members the linker discarded (e.g. TIFFReadRGBAImage,
TIFFPrintDirectory) now show up as unreachable instead of vanishing, while the
reachable set is unchanged from the linker's view — adding the rest of the
library can only add unreachable functions, never remove reachable ones. Use
--static-libs none for the linker's view only, or all to include every
bitcode archive in the tree.
A ziggy harness is a Rust binary whose fuzz
loop lives in main rather than in LLVMFuzzerTestOneInput. --lang ziggy
acquires the bitcode and roots at main automatically:
reachability run --lang ziggy --project <harness> --out z.jsonFor complete, start-to-finish walkthroughs on real targets — ziggy (the
urlcrate), cargo-afl (cpp_demangle and rustyknife), and libFuzzer (libxml2) harnesses — seedocs/EXAMPLES.md.
The reachability CLI has two subcommands.
Resolves and validates the LLVM toolchain (analyzer, clang/clang++,
llvm-link, opt, rustc) for version coherence and prints what it found. Run it
first; it exits non-zero on any incoherence. See
docs/llvm-support.md for the policy.
Builds a project, merges its bitcode, and computes the reachable set from the entry point(s).
| Option | Default | Meaning |
|---|---|---|
--project DIR |
(required) | Project directory to build and analyze. |
--lang TARGET |
(required) | Target type (see the table below): sets how bitcode is acquired and the default entry. |
--out FILE |
reachability.json in --project |
Path for the JSON report. A directory writes reachability.json into it. The two sancov lists default to reached.txt / not_reached.txt beside it. |
--entry NAME |
per --lang |
Entry to root reachability at. Repeatable; overrides the target default. See Entry resolution. |
--backend NAME |
(none) | Deprecated and ignored; the type-based backend is always used. Accepted for backward compatibility — passing it prints a warning. |
--artifact PATH |
auto-detect | C/C++ only: the built binary/object/archive to extract bitcode from (relative to --project). Auto-detected otherwise, preferring an executable over a shared library, archive, then object. |
--build-cmd CMD |
auto-detect | C/C++ only: shell build command, run with gllvm injected. E.g. "cmake -S . -B build && cmake --build build". Auto-detected from the project files otherwise (configure → Makefile → CMakeLists.txt → build.ninja → meson.build, else make). |
--static-libs {auto,none,all} |
auto |
C/C++ only: how to treat static archives (.a) the target links. auto also analyzes each linked archive in full, so members the linker dropped are reported rather than silently absent. none keeps only the linker's view. all includes every bitcode archive in the tree, skipping any whose members another archive already covers and resolving residual overlaps at link time (llvm-link --override). |
--profile {debug,release} |
debug |
Rust only: cargo profile for the bitcode build. Match the fuzz binary's profile. See Matching the fuzz binary's build. |
--codegen-units N |
1 |
Rust only: rustc -Ccodegen-units for the bitcode build (positive integer). Match the fuzz binary's value. See Matching the fuzz binary's build. |
--build-std |
off | Rust only: build the standard library from source (-Zbuild-std) so std functions appear in the graph instead of as external declarations. |
--dot FILE |
(none) | Also write the reachable subgraph as Graphviz DOT (indirect edges dashed/red). |
--reached FILE |
beside --out |
Path for the sancov allowlist of reachable functions. |
--not-reached FILE |
beside --out |
Path for the sancov ignorelist of unreachable functions. |
-v, --verbose |
off | Narrate each pipeline stage (toolchain → build → merge → analyze): echoes the tool commands run, streams the build output live, and lists the collected bitcode modules. |
--lang |
acquires via | default entry |
|---|---|---|
c |
gllvm (gclang) |
main + LLVMFuzzerTestOneInput |
cpp |
gllvm (gclang++) |
main + LLVMFuzzerTestOneInput |
rust |
cargo + --emit=llvm-bc |
main |
mixed |
gllvm and cargo (merged) | LLVMFuzzerTestOneInput |
libfuzzer |
cargo (Rust) | fuzz_target! |
ziggy |
cargo (Rust) | main |
afl |
cargo (Rust) | main |
The C/C++ targets root at both main and LLVMFuzzerTestOneInput, so one
--lang c/cpp covers a normal program and an LLVMFUzzerTestOneInput harness alike. A default
entry that matches nothing is a harmless warning, because roots are unioned.
--entry never requires a mangled symbol. A token matches — unioned across all
of:
- an exact mangled symbol (e.g.
_Z3foov), - an exact demangled name (e.g.
foo()), - a demangled
::namesuffix (somainfindscrate::main), and - the alias
fuzz_target!(→LLVMFuzzerTestOneInput+rust_fuzzer_test_input).
Matching more than one function only adds roots, which stays sound. For a Rust
binary, just root at main: the token matches the real Rust main, so you never
need to type a mangled symbol.
For Rust targets the driver builds its own bitcode (cargo build --emit=llvm-bc)
and computes reachability from that. For the resulting reached.txt /
not_reached.txt to line up with the binary you actually instrument, that
bitcode build should match the fuzz binary's build. Two Rust-only options control
this; both default to the most common fuzzer setup and are ignored for C/C++.
-
--profile {debug,release}(defaultdebug) — the cargo profile. The optimization level drives generic sharing (rustc's-Zshare-genericsis on when unoptimized, off when optimized): it decides which crate instantiates each generic, and so which monomorphizations exist and how they are mangled. A debug snapshot against a release fuzz binary (or vice versa) therefore produces a different function set. Pass--profile releasefor an optimized fuzz build. -
--codegen-units N(default1) — passed through verbatim as rustc-Ccodegen-units. The unit count sets inlining boundaries, hence which monomorphizations survive as standalone functions rather than being inlined away.Nis any positive integer (rustc rejects0/negative). Useful values:1— a single unit per crate: maximum inlining and exactly one.bcper crate. Many fuzzing profiles pincodegen-units = 1for better optimization, so the default already matches them.16— the rustc default for a cargo release build (incremental off).256— the rustc default for a cargo dev/debug build (incremental on).
With
N> 1 rustc splits each crate into severaltarget/<profile>/deps/<crate>-<hash>.<cgu>.rcgu.bcfiles; the driver collects all of them.
How to choose. Use whatever your fuzz build uses. That is the cargo/rustc
default for its profile (release → 16, dev → 256) unless a [profile.*] codegen-units in Cargo.toml or a -Ccodegen-units in RUSTFLAGS overrides
it. If unsure, build the fuzz target with codegen-units = 1 and keep the
defaults here — the two then agree.
The fun: patterns in the lists already tolerate the Rust mangling
disambiguator (17h<hash>) drifting between builds (see Output), but
that only covers the naming of a given instance. Matching --profile /
--codegen-units is what aligns the set of emitted functions; a wildcard
cannot recover a function that one build inlined away and the other did not.
| Variable | Purpose |
|---|---|
REACHABILITY_ANALYZER |
Path to the analyzer binary (default analyzer/build/reachability-analyzer). |
CLANG / CLANGXX / LLVM_LINK / OPT |
Override individual tool paths (otherwise resolved by major from the analyzer's LLVM). |
PATH |
Must contain gclang / gclang++ / get-bc (gllvm) for C/C++/mixed targets. |
reachability run writes three files:
<out>.json—summarycounts, areachablearray (mangled and demangled name, source file/line when debug info is present,via=direct/indirect/both, and anindirect_onlyflag), and anunreachable_definedarray. With--dot FILE, also the reachable subgraph.reached.txt— a SanitizerCoverage allowlist of reachable functions.not_reached.txt— a SanitizerCoverage ignorelist of unreachable functions.
Both lists use each function's mangled (LLVM symbol) name — what clang and
AFL++ match fun: against — so they cover C, C++, and Rust. Feed either to clang
or AFL++ to instrument only reachable code:
# instrument ONLY reachable functions:
clang -fsanitize-coverage=trace-pc-guard -fsanitize-coverage-allowlist=reached.txt ...
# OR: instrument everything EXCEPT unreachable functions:
clang -fsanitize-coverage=trace-pc-guard -fsanitize-coverage-ignorelist=not_reached.txt ...A coverage allowlist instruments a function only when both a
src:and afun:entry match, soreached.txtopens with asrc:*line. An ignorelist has no such requirement, sonot_reached.txtis purefun:lines. (Verified against clang indriver/tests/test_covlists.py.)
Rust mangling disambiguator. A Rust generic instance is mangled with a trailing
17h<hash>disambiguator whose value depends on the build (opt level, codegen units, instantiating crate). The exact value differs between this bitcode snapshot and the instrumented fuzz binary, so an exact-name entry would miss. Eachfun:entry therefore replaces that disambiguator with a*glob, which both clang sancov and AFL++ honour, so an entry matches the instance in any build. An ignorelist pattern that would also match a reachable instance is dropped, so excluding unreachable code never excludes reachable code. For best fidelity still build the snapshot with the same--profileand--codegen-unitsas the fuzz binary, so the set of emitted monomorphizations matches; the*only tolerates the disambiguator, not a different function set.
Indirect calls (C function pointers, C++ virtual dispatch, Rust dyn/fn
pointers) are resolved by the type-based resolver: an indirect call of
function type T may reach any address-taken function whose LLVM function type
is T. It is language-agnostic, always available, and sound — a deliberate
over-approximation. The --indirect-any debug flag widens this further, to any
address-taken function regardless of type.
See docs/llvm-support.md for the LLVM compatibility
matrix.
Earlier versions shipped an optional second backend, --backend=svf, built on
SVF's Andersen points-to analysis, meant to
narrow the type-based over-approximation per call site. It was removed: in
practice it produced essentially the same reachable sets as the default
type-based backend while costing far more. It built only against a pinned LLVM
21 (it failed on 22/23), required a separately vendored SVF + Z3 build with a
local source patch, ran substantially slower, and was more fragile to operate —
so it offered no practical benefit over the type-based backend, which is sound,
language-agnostic, and works on every supported LLVM. The --backend flag is
retained only for backward compatibility: it is accepted but ignored (with a
warning).
make test # full pytest suite (analyzer .ll goldens + per-language soundness)
make matrix # LLVM version-compatibility matrix (catches future-LLVM breakage)Each fixture in fixtures/ carries a must_reach / must_not_reach set; every
backend must satisfy the soundness invariant on each.
analyzer/ C++ analyzer + Makefile (src/, built via llvm-config)
driver/ Python driver (toolchain, acquire_*, link, analyze, cli)
fixtures/ per-language test targets with expected reachable sets
examples/ worked examples (cpp_cmake/)
scripts/ setup.sh, test_matrix.sh, select_llvm.sh
docs/ worked examples (EXAMPLES.md), LLVM support
This is a static over-approximation, not dynamic coverage. Its precision is
bounded by indirect-call resolution and by any missing bitcode — precompiled
libraries, or the Rust standard library without --build-std.