Skip to content

[BlackboxBenchmarking] Index the Fuzzer.builtin field#5263

Merged
ViniciustCosta merged 1 commit into
masterfrom
dylanj/builtin-index
May 6, 2026
Merged

[BlackboxBenchmarking] Index the Fuzzer.builtin field#5263
ViniciustCosta merged 1 commit into
masterfrom
dylanj/builtin-index

Conversation

@dylanjew
Copy link
Copy Markdown
Collaborator

We use this field to determine whether a fuzzer is blackbox or an engine guided fuzzer. In order to aggregate stats for blackbox fuzzers, we need to be able to query by the builtin field, which requires it to be indexed.

https://docs.cloud.google.com/datastore/docs/concepts/indexes#unindexed_properties

The migration to index the field for existing fuzzers just requires rewriting the Fuzzers in the datastore without actually making any changes. Then the index will be created under the hood.

Testing
I ran this in dev and verified I can filter by builtin == False

@dylanjew dylanjew requested a review from a team as a code owner April 30, 2026 15:54
@dylanjew dylanjew requested a review from aakallam April 30, 2026 15:55
@dylanjew dylanjew changed the title Index the Fuzzer.builtin field [BlackboxBenchmarking] Index the Fuzzer.builtin field Apr 30, 2026
Copy link
Copy Markdown
Collaborator

@aakallam aakallam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dylanjew dylanjew force-pushed the dylanj/builtin-index branch from 865a705 to e58449a Compare April 30, 2026 15:59
@dylanjew
Copy link
Copy Markdown
Collaborator Author

Fixed lint error for the unused import

Copy link
Copy Markdown
Collaborator

@ViniciustCosta ViniciustCosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ViniciustCosta ViniciustCosta merged commit 7444e54 into master May 6, 2026
9 of 11 checks passed
@ViniciustCosta ViniciustCosta deleted the dylanj/builtin-index branch May 6, 2026 13:00
ViniciustCosta pushed a commit that referenced this pull request May 11, 2026
…5265)

Adds cron job to aggregate fuzzer stats into a daily bigquery table
`fuzzer_stats.daily_stats`.

## Context

We will use this to benchmark our blackbox fuzzers, previously we
couldn't easily join the fuzzing hours from BigQuery with the bugs filed
by clusterfuzz in our dashboards. We need a separate aggregated table
because the `fuzzer_stats` `JobRun` tables are all in separate datasets
per fuzzer, and we can't simply query across all of those datasets in
BigQuery or Plx.

The cron job defaults to yesterdays stats so we can run it after the
stats are loaded into bigquery, but takes a date flag so we can backfill
days as necessary.

### Idempotency
Whenever a date is inserted, the schema uses `WRITE_TRUNCATE` with a
date partition to overwrite all of the rows for that date. So if the job
runs multiple times for the same day, it will not add additional rows
but overwrite any previous rows for that date.

This simplifies edge cases where the job fails or runs multiple times.
We can just make sure the last run of the job succeeds and the data will
be correct. It will just pull in the latest data on the JobRun tables
for the fuzzers.


#### Example query:
```
select fuzzer_name,
SUM(fuzzing_duration) as fuzzing_duration,
SUM(testcases_executed) as testcases_executed,
from `your-project.fuzzer_stats.daily_stats`
group by fuzzer_name
order by fuzzing_duration desc
limit 1000;
```

The remaining work here is to set up the cron job configuration. This PR
only adds the logic for the job.
[crbug.com/501066151](https://crbug.com/501066151)

### Related PRs:
These migrate the bigquery and datastore schemas to support the new
fields
#5264
#5263


### Testing
Ran this against the dev data and verified that the fuzzer stats
bigquery table is populated.
Logs from dev: https://paste.googleplex.com/4884361662038016

After the job inserted the aggregated rows into BigQuery, I was able to
compare the aggregated testcase stats and fuzzing hours between fuzzers
for a given date range.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants