Skip to content

chore: refresh JMH benchmark numbers + benchmark infra fixes (KOJAK-82)#46

Merged
endrju19 merged 6 commits into
feature/kojak-73-kafka-deliver-batchfrom
chore/kojak-82-full-jmh-rerun
May 17, 2026
Merged

chore: refresh JMH benchmark numbers + benchmark infra fixes (KOJAK-82)#46
endrju19 merged 6 commits into
feature/kojak-73-kafka-deliver-batchfrom
chore/kojak-82-full-jmh-rerun

Conversation

@endrju19
Copy link
Copy Markdown
Collaborator

@endrju19 endrju19 commented May 16, 2026

What

Refreshes benchmarks/kafka-deliverbatch.json and benchmarks/results-kafka-deliverbatch.md with a full-config JMH run (fork=2, warmup=3, iter=5, n=10 samples per benchmark). Also lands three benchmark-infra fixes that were needed to make the run complete cleanly.

Headline (Kafka throughput, msg/s)

batchSize Score (ms/op) msg/s vs sync-sequential baseline
10 0.559 ± 0.029 ~1,790 16.4×
50 0.242 ± 0.007 ~4,132 35.8×
100 0.193 ± 0.004 ~5,181 45.1×

All Kafka error bars <5% of score. Numbers reproduced across two independent runs (delta <3% between them).

Benchmark infrastructure fixes (also in this PR)

Without these, the JMH run cannot complete:

  1. JMH JVM heap — bumped to -Xmx8g. Throughput-mode microbenches were OOMing at the previous -Xmx2g because Jackson + Kotlin reflection allocate per call at ~1M ops/s rates.
  2. Liquibase duplicate-changelog workaround — added -Dliquibase.duplicateFileMode=WARN to JVM args. The fat JMH jar and okapi-postgres.jar both ship the changelog at the same path; Liquibase 4.x treats this as an error by default. Files are identical so WARN is safe.
  3. MockProducer.history cleared after each send() in DelivererMicroBenchmark. MockProducer retains every record sent for inspection; at ~1M ops/s for 30s × forks × iters that list grew to GBs and OOMed the JVM regardless of heap size. Microbench doesn't need to inspect what was sent — discarding per call is safe. With this fix, DelivererMicroBenchmark.kafkaDeliver now produces meaningful numbers (2.3M ± 19k ops/s) instead of error > score.

Files touched

  • benchmarks/kafka-deliverbatch.json — full-config raw results
  • benchmarks/results-kafka-deliverbatch.mdScore ± Error tables + microbench section + HTTP companion table
  • README.md — refreshed throughput table
  • okapi-benchmarks/build.gradle.kts — heap bump + Liquibase JVM arg
  • okapi-benchmarks/src/jmh/kotlin/.../DelivererMicroBenchmark.kt — MockProducer override

Notes

  • Run on JDK 21 LTS (matches CLAUDE.md target).
  • HTTP throughput numbers also included for completeness — still sync sequential, KOJAK-74 will address.
  • DelivererMicroBenchmark.httpDeliver benchmarks the WireMock-local HTTP path; numbers are dominated by loopback TCP cost, not library overhead.

Base branch

Based on feature/kojak-73-kafka-deliver-batch (PR #40). Merge after #40.

Test plan

  • ./gradlew :okapi-benchmarks:jmh — completes with BUILD SUCCESSFUL, all benchmarks produce non-NaN error bars
  • Reproducibility verified: two independent runs, Kafka throughput delta <3%

endrju19 added 3 commits May 14, 2026 14:18
…(KOJAK-82)

Replaces the smoke-run numbers (fork=1, warmup=1, iter=2, n=2, scoreError=NaN)
with a full publishable run (fork=2, warmup=3 × 10s, iter=5 × 30s, n=10).

## Headline (Kafka throughput, msg/s)

| batchSize | Smoke    | Full run       | Improvement vs baseline |
|-----------|----------|----------------|-------------------------|
| 10        | ~1,468   | ~1,825 ± 70    | 16.7× (was 13.5×)       |
| 50        | ~3,731   | ~4,184 ± 140   | 36.3× (was 32.3×)       |
| 100       | ~4,717   | ~5,128 ± 105   | 44.6× (was 41.0×)       |

All Kafka throughput error bars <5% of score — multipliers now statistically
defensible. The smoke-run numbers were directionally correct but slightly
conservative; full-run shows the optimization is even better than initially
claimed.

## Benchmark infrastructure fixes (needed to land the rerun)

- okapi-benchmarks build.gradle.kts: bump JMH JVM heap to -Xmx8g
  (the previous default -Xmx2g OOMed inside throughput-mode microbenches)
- okapi-benchmarks build.gradle.kts: pass -Dliquibase.duplicateFileMode=WARN
  (okapi-postgres.jar and the fat JMH jar both carry the changelog at the
  same path; Liquibase 4.x treats this as an error by default; the files
  are identical so WARN is safe)
- DelivererMicroBenchmark.kt: subclass MockProducer to clear() history
  after every send. MockProducer retains every record sent for inspection;
  at ~1M ops/s for 30s × forks × iters that list grew to GBs and OOMed
  the JVM regardless of heap size. The fix discards history per call —
  microbench doesn't need to inspect what was sent.

## Files updated

- benchmarks/kafka-deliverbatch.json: replaced with full-config results
- benchmarks/results-kafka-deliverbatch.md: new Score +/- Error tables;
  removed "Statistical caveat" callout; tightened narrative; added HTTP
  companion table for full-run completeness
- README.md: refreshed throughput table (1,470 -> 1,825 / 4,720 -> 5,130),
  improvement claim (13-41x -> 17-45x), JDK note (25 -> 21)

Note on JDK delta: smoke run was on JDK 25.0.2 (anomaly - SDKMAN default
shifted between runs); this full run is on JDK 21.0.7. CLAUDE.md target
is JVM 21 so this matches what consumers will see.

DelivererMicroBenchmark.kafkaDeliver still produces high-variance results
(error > score) - JIT warmup interacts poorly with the Jackson-per-call
deserialization. Not a blocker for KOJAK-82 (the throughput benchmarks
are the publishable surface); a follow-up could switch the microbench
to AverageTime mode or cache the deserialized DeliveryInfo.
Replaces previous JMH run results with a re-run under the same config
(fork=2, warmup=3, iter=5). Kafka throughput numbers move <3% vs prior
run — well within error bars — confirming reproducibility.

Kafka throughput (msg/s):
  batchSize=10  → ~1,790 (was ~1,825)
  batchSize=50  → ~4,132 (was ~4,184)
  batchSize=100 → ~5,181 (was ~5,128)

DelivererMicroBenchmark.kafkaDeliver now produces meaningful numbers
(2.3M ± 19k ops/s — error <1%) thanks to the MockProducer.clear() fix
shipped earlier in this PR. Previous run had error > score (benchmark
was hitting GC pressure from the MockProducer leak before the fix).
@endrju19 endrju19 changed the title chore: full JMH rerun for Kafka deliverBatch — publishable confidence intervals (KOJAK-82) chore: refresh JMH benchmark numbers + benchmark infra fixes (KOJAK-82) May 17, 2026
…atch' into chore/kojak-82-full-jmh-rerun

# Conflicts:
#	README.md
endrju19 added a commit that referenced this pull request May 17, 2026
#48)

## Summary

Three independent fixes that make \`./gradlew :okapi-benchmarks:jmh\`
complete cleanly. Before these, the JMH run OOMs partway through. All
three issues exist on main today; running the benchmark suite without
these fixes will fail.

No test or production code is touched — pure benchmark infrastructure.

## Fixes

### 1. Bump JMH JVM heap to \`-Xmx8g\`

Throughput-mode microbenchmarks call \`deliver()\` at ~1M ops/s; each
call allocates Jackson + Kotlin reflection state for JSON
deserialization. At the previous default \`-Xmx2g\` the allocation rate
exceeds GC throughput and OOMs within the first measurement iteration.

### 2. Pass \`-Dliquibase.duplicateFileMode=WARN\` as JMH JVM arg

\`okapi-postgres.jar\` and the fat JMH jar both ship the changelog at
the same classpath path
(\`com/softwaremill/okapi/db/postgres/changelog.xml\`). Liquibase 4.x
treats duplicate resources as an error by default, which aborts
\`PostgresBenchmarkSupport\` setup. The two files are identical (same
jar source on the classpath twice), so \`WARN\` is safe.

### 3. Subclass \`MockProducer\` in \`DelivererMicroBenchmark\` to
\`clear()\` history after every \`send()\`

\`MockProducer.history\` (internal \`sent\` list) retains every record
sent for inspection — there is no eviction. In throughput mode at ~1M
ops/s for 30s × forks × iterations that list grew to GBs and OOMed the
JVM regardless of heap size. Discarding per call is safe because
microbench doesn't inspect what was sent — only timing.

With this fix, \`DelivererMicroBenchmark.kafkaDeliver\` now produces
meaningful numbers (~2.3M ops/s ± <1%) instead of \`error > score\`.

## Files

- \`okapi-benchmarks/build.gradle.kts\` — JVM args
- \`okapi-benchmarks/src/jmh/kotlin/.../DelivererMicroBenchmark.kt\` —
MockProducer override

## Why a separate PR

These are pure infrastructure fixes — completely independent of any
specific benchmark or transport implementation. Carved out from PR #46
(KOJAK-82) so they can land on main right away, without waiting for the
Kafka deliverBatch (#40) review cycle. PR #46 will then contain only the
refreshed JMH numbers.

## Test plan
- [x] \`./gradlew :okapi-benchmarks:compileJmhKotlin\` passes
- [x] Verified locally: full \`./gradlew :okapi-benchmarks:jmh\` run
completes with \`BUILD SUCCESSFUL\` and no OOM
@endrju19 endrju19 merged commit 41e1122 into feature/kojak-73-kafka-deliver-batch May 17, 2026
@endrju19 endrju19 deleted the chore/kojak-82-full-jmh-rerun branch May 17, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant