chore: refresh JMH benchmark numbers + benchmark infra fixes (KOJAK-82) by endrju19 · Pull Request #46 · softwaremill/okapi

endrju19 · 2026-05-16T11:43:32Z

What

Refreshes benchmarks/kafka-deliverbatch.json and benchmarks/results-kafka-deliverbatch.md with a full-config JMH run (fork=2, warmup=3, iter=5, n=10 samples per benchmark). Also lands three benchmark-infra fixes that were needed to make the run complete cleanly.

Headline (Kafka throughput, msg/s)

batchSize	Score (ms/op)	msg/s	vs sync-sequential baseline
10	0.559 ± 0.029	~1,790	16.4×
50	0.242 ± 0.007	~4,132	35.8×
100	0.193 ± 0.004	~5,181	45.1×

All Kafka error bars <5% of score. Numbers reproduced across two independent runs (delta <3% between them).

Benchmark infrastructure fixes (also in this PR)

Without these, the JMH run cannot complete:

JMH JVM heap — bumped to -Xmx8g. Throughput-mode microbenches were OOMing at the previous -Xmx2g because Jackson + Kotlin reflection allocate per call at ~1M ops/s rates.
Liquibase duplicate-changelog workaround — added -Dliquibase.duplicateFileMode=WARN to JVM args. The fat JMH jar and okapi-postgres.jar both ship the changelog at the same path; Liquibase 4.x treats this as an error by default. Files are identical so WARN is safe.
MockProducer.history cleared after each send() in DelivererMicroBenchmark. MockProducer retains every record sent for inspection; at ~1M ops/s for 30s × forks × iters that list grew to GBs and OOMed the JVM regardless of heap size. Microbench doesn't need to inspect what was sent — discarding per call is safe. With this fix, DelivererMicroBenchmark.kafkaDeliver now produces meaningful numbers (2.3M ± 19k ops/s) instead of error > score.

Files touched

benchmarks/kafka-deliverbatch.json — full-config raw results
benchmarks/results-kafka-deliverbatch.md — Score ± Error tables + microbench section + HTTP companion table
README.md — refreshed throughput table
okapi-benchmarks/build.gradle.kts — heap bump + Liquibase JVM arg
okapi-benchmarks/src/jmh/kotlin/.../DelivererMicroBenchmark.kt — MockProducer override

Notes

Run on JDK 21 LTS (matches CLAUDE.md target).
HTTP throughput numbers also included for completeness — still sync sequential, KOJAK-74 will address.
DelivererMicroBenchmark.httpDeliver benchmarks the WireMock-local HTTP path; numbers are dominated by loopback TCP cost, not library overhead.

Base branch

Based on feature/kojak-73-kafka-deliver-batch (PR #40). Merge after #40.

Test plan

./gradlew :okapi-benchmarks:jmh — completes with BUILD SUCCESSFUL, all benchmarks produce non-NaN error bars
Reproducibility verified: two independent runs, Kafka throughput delta <3%

# Conflicts: # README.md

…(KOJAK-82) Replaces the smoke-run numbers (fork=1, warmup=1, iter=2, n=2, scoreError=NaN) with a full publishable run (fork=2, warmup=3 × 10s, iter=5 × 30s, n=10). ## Headline (Kafka throughput, msg/s) | batchSize | Smoke | Full run | Improvement vs baseline | |-----------|----------|----------------|-------------------------| | 10 | ~1,468 | ~1,825 ± 70 | 16.7× (was 13.5×) | | 50 | ~3,731 | ~4,184 ± 140 | 36.3× (was 32.3×) | | 100 | ~4,717 | ~5,128 ± 105 | 44.6× (was 41.0×) | All Kafka throughput error bars <5% of score — multipliers now statistically defensible. The smoke-run numbers were directionally correct but slightly conservative; full-run shows the optimization is even better than initially claimed. ## Benchmark infrastructure fixes (needed to land the rerun) - okapi-benchmarks build.gradle.kts: bump JMH JVM heap to -Xmx8g (the previous default -Xmx2g OOMed inside throughput-mode microbenches) - okapi-benchmarks build.gradle.kts: pass -Dliquibase.duplicateFileMode=WARN (okapi-postgres.jar and the fat JMH jar both carry the changelog at the same path; Liquibase 4.x treats this as an error by default; the files are identical so WARN is safe) - DelivererMicroBenchmark.kt: subclass MockProducer to clear() history after every send. MockProducer retains every record sent for inspection; at ~1M ops/s for 30s × forks × iters that list grew to GBs and OOMed the JVM regardless of heap size. The fix discards history per call — microbench doesn't need to inspect what was sent. ## Files updated - benchmarks/kafka-deliverbatch.json: replaced with full-config results - benchmarks/results-kafka-deliverbatch.md: new Score +/- Error tables; removed "Statistical caveat" callout; tightened narrative; added HTTP companion table for full-run completeness - README.md: refreshed throughput table (1,470 -> 1,825 / 4,720 -> 5,130), improvement claim (13-41x -> 17-45x), JDK note (25 -> 21) Note on JDK delta: smoke run was on JDK 25.0.2 (anomaly - SDKMAN default shifted between runs); this full run is on JDK 21.0.7. CLAUDE.md target is JVM 21 so this matches what consumers will see. DelivererMicroBenchmark.kafkaDeliver still produces high-variance results (error > score) - JIT warmup interacts poorly with the Jackson-per-call deserialization. Not a blocker for KOJAK-82 (the throughput benchmarks are the publishable surface); a follow-up could switch the microbench to AverageTime mode or cache the deserialized DeliveryInfo.

Replaces previous JMH run results with a re-run under the same config (fork=2, warmup=3, iter=5). Kafka throughput numbers move <3% vs prior run — well within error bars — confirming reproducibility. Kafka throughput (msg/s): batchSize=10 → ~1,790 (was ~1,825) batchSize=50 → ~4,132 (was ~4,184) batchSize=100 → ~5,181 (was ~5,128) DelivererMicroBenchmark.kafkaDeliver now produces meaningful numbers (2.3M ± 19k ops/s — error <1%) thanks to the MockProducer.clear() fix shipped earlier in this PR. Previous run had error > score (benchmark was hitting GC pressure from the MockProducer leak before the fix).

…atch' into chore/kojak-82-full-jmh-rerun # Conflicts: # README.md

#48) ## Summary Three independent fixes that make \`./gradlew :okapi-benchmarks:jmh\` complete cleanly. Before these, the JMH run OOMs partway through. All three issues exist on main today; running the benchmark suite without these fixes will fail. No test or production code is touched — pure benchmark infrastructure. ## Fixes ### 1. Bump JMH JVM heap to \`-Xmx8g\` Throughput-mode microbenchmarks call \`deliver()\` at ~1M ops/s; each call allocates Jackson + Kotlin reflection state for JSON deserialization. At the previous default \`-Xmx2g\` the allocation rate exceeds GC throughput and OOMs within the first measurement iteration. ### 2. Pass \`-Dliquibase.duplicateFileMode=WARN\` as JMH JVM arg \`okapi-postgres.jar\` and the fat JMH jar both ship the changelog at the same classpath path (\`com/softwaremill/okapi/db/postgres/changelog.xml\`). Liquibase 4.x treats duplicate resources as an error by default, which aborts \`PostgresBenchmarkSupport\` setup. The two files are identical (same jar source on the classpath twice), so \`WARN\` is safe. ### 3. Subclass \`MockProducer\` in \`DelivererMicroBenchmark\` to \`clear()\` history after every \`send()\` \`MockProducer.history\` (internal \`sent\` list) retains every record sent for inspection — there is no eviction. In throughput mode at ~1M ops/s for 30s × forks × iterations that list grew to GBs and OOMed the JVM regardless of heap size. Discarding per call is safe because microbench doesn't inspect what was sent — only timing. With this fix, \`DelivererMicroBenchmark.kafkaDeliver\` now produces meaningful numbers (~2.3M ops/s ± <1%) instead of \`error > score\`. ## Files - \`okapi-benchmarks/build.gradle.kts\` — JVM args - \`okapi-benchmarks/src/jmh/kotlin/.../DelivererMicroBenchmark.kt\` — MockProducer override ## Why a separate PR These are pure infrastructure fixes — completely independent of any specific benchmark or transport implementation. Carved out from PR #46 (KOJAK-82) so they can land on main right away, without waiting for the Kafka deliverBatch (#40) review cycle. PR #46 will then contain only the refreshed JMH numbers. ## Test plan - [x] \`./gradlew :okapi-benchmarks:compileJmhKotlin\` passes - [x] Verified locally: full \`./gradlew :okapi-benchmarks:jmh\` run completes with \`BUILD SUCCESSFUL\` and no OOM

…atch' into chore/kojak-82-full-jmh-rerun

endrju19 added 3 commits May 14, 2026 14:18

Merge branch 'main' into chore/kojak-82-full-jmh-rerun

df9e6db

# Conflicts: # README.md

endrju19 changed the title ~~chore: full JMH rerun for Kafka deliverBatch — publishable confidence intervals (KOJAK-82)~~ chore: refresh JMH benchmark numbers + benchmark infra fixes (KOJAK-82) May 17, 2026

Merge remote-tracking branch 'origin/feature/kojak-73-kafka-deliver-b…

69f0dab

…atch' into chore/kojak-82-full-jmh-rerun # Conflicts: # README.md

endrju19 mentioned this pull request May 17, 2026

fix: benchmark infrastructure — JVM heap, Liquibase, MockProducer leak #48

Merged

2 tasks

style: fix import order in DelivererMicroBenchmark per ktlint

2ca664f

Merge remote-tracking branch 'origin/feature/kojak-73-kafka-deliver-b…

629700d

…atch' into chore/kojak-82-full-jmh-rerun

endrju19 merged commit 41e1122 into feature/kojak-73-kafka-deliver-batch May 17, 2026

endrju19 deleted the chore/kojak-82-full-jmh-rerun branch May 17, 2026 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: refresh JMH benchmark numbers + benchmark infra fixes (KOJAK-82)#46

chore: refresh JMH benchmark numbers + benchmark infra fixes (KOJAK-82)#46
endrju19 merged 6 commits into
feature/kojak-73-kafka-deliver-batchfrom
chore/kojak-82-full-jmh-rerun

endrju19 commented May 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

endrju19 commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Headline (Kafka throughput, msg/s)

Benchmark infrastructure fixes (also in this PR)

Files touched

Notes

Base branch

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

endrju19 commented May 16, 2026 •

edited

Loading