Skip to content

ParparVM: index native-symbol and class lookups in the dead-code cull (O(N^2) -> ~O(N))#5236

Open
shai-almog wants to merge 3 commits into
masterfrom
perf/parparvm-native-symbol-index
Open

ParparVM: index native-symbol and class lookups in the dead-code cull (O(N^2) -> ~O(N))#5236
shai-almog wants to merge 3 commits into
masterfrom
perf/parparvm-native-symbol-index

Conversation

@shai-almog

@shai-almog shai-almog commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Problem

The ParparVM dead-code optimizer ("unused Method cull") had two quadratic "scan the same thing repeatedly" costs that made large / native-heavy iOS apps slow to translate (and on the build server, slow enough to hit the translator timeout):

  1. Native-symbol scan — for every Java method/class it asked "is this referenced by native code?" by substring-scanning the entire native source corpus per query (isMethodUsedByNative -> for each native source: s.contains(symbol)): O(methods × native_bytes). Apps pulling in many cn1libs with native code (camera, scanners, maps, ...) pay heavily.
  2. Class lookupgetClassObject / getClassByName / ByteCodeClass.findClass each linear-scanned the whole class list, called per dependency per class during markDependencies/updateAllDependencies (run ~5× per build): O(N²).

Found while investigating a real iOS cloud build hitting the translator timeout. (Heap was ruled out — the cost is deterministic/algorithmic, not GC.)

Fix

Two indexes, each built once and queried in ~O(1):

  • NativeSymbolIndex — a suffix automaton over the distinct native identifier tokens. Native symbols are CN1 mangled identifiers, and a query matches iff it's a substring of some token, so this is semantically identical to the old String.contains. Tokenizing into the distinct set dedups across files, bounding the structure. O(methods × native_bytes) -> O(native_bytes) once + O(|symbol|) per query.
  • Name → class HashMap — replaces the three linear scans. Rebuilt lazily when classes changes (tracked by the (reference, size) pair, which is sufficient because classes is only ever reassigned / grown via add() / cleared). First-match semantics preserved.

Both run single-threaded across one translation run; the lazily-built static caches are excluded for LI_LAZY_INIT_* in spotbugs-exclude.xml, matching the file's existing convention.

Measurements

Real failing iOS build's classes (5476, held fixed); native corpus scaled by duplicating .m files:

native files before native index only + class index methods removed
62 15 s 14 s 7 s 6968 / 6968
462 24 s 15.5 s 7 s 6968 / 6968
1362 48 s 16 s 7 s 6968 / 6968
  • Correctness gate: the cull removes the exact same 6968 methods at every size — the indexes change what's scanned, never what's eliminated.
  • Result: cull time is now flat in both class count and native-file count.

🤖 Generated with Claude Code

…e cull

The dead-code optimizer ("unused Method cull") asks, for every Java method
and class, "is this referenced by native code?" by substring-scanning the
entire native source corpus per query (isMethodUsedByNative ->
for each native source: s.contains(symbol)). That is O(methods x native_bytes),
so apps that pull in a lot of native source (many cn1libs: camera, scanners,
maps, ...) pay a large, growing cull cost.

Native symbols are CN1 mangled identifiers, and a query X matches iff X is a
substring of some maximal identifier token in the native text. So we build,
once per run, a suffix automaton over the DISTINCT native identifier tokens
(NativeSymbolIndex) and answer each query in O(|symbol|). Tokenizing into the
distinct set dedups repeated symbols across files, bounding the structure.
Semantics are identical to the old String.contains scan because the query
strings are themselves delimiter-free identifiers.

Measured on a real failing iOS build (5476 classes held fixed, native corpus
scaled by duplicating .m files):

  native files   cull (before)   cull (after)   methods removed
  62             15.2s           14.3s          6968 / 6968
  462            24.2s           15.5s          6968 / 6968
  1362           48.2s           16.3s          6968 / 6968

The cull removes the exact same 6968 methods at every size (correctness gate),
and the native-scan cost (~32s at 1362 files) collapses to a sub-second
one-time index build. The residual flat ~15s is the separate class-graph
lookup cost (findClass/getClassObject linear scans), addressed in a follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

✅ ByteCodeTranslator Quality Report

Test & Coverage

  • Tests: 388 total, 0 failed, 12 skipped

Benchmark Results

  • Execution Time: 13725 ms

  • Hotspots (Top 20 sampled methods):

    • 22.75% java.util.ArrayList.indexOf (276 samples)
    • 11.54% com.codename1.tools.translator.BytecodeMethod.addToConstantPool (140 samples)
    • 4.86% com.codename1.tools.translator.Parser.addToConstantPool (59 samples)
    • 3.71% java.lang.StringBuilder.append (45 samples)
    • 2.56% com.codename1.tools.translator.BytecodeMethod.equals (31 samples)
    • 2.31% com.codename1.tools.translator.Parser.classIndex (28 samples)
    • 2.14% java.lang.Object.hashCode (26 samples)
    • 2.06% com.codename1.tools.translator.Parser.generateClassAndMethodIndexHeader (25 samples)
    • 1.57% com.codename1.tools.translator.BytecodeMethod.optimize (19 samples)
    • 1.57% org.objectweb.asm.ClassReader.readCode (19 samples)
    • 1.48% java.util.TreeMap.getEntry (18 samples)
    • 1.40% com.codename1.tools.translator.BytecodeMethod.appendCMethodPrefix (17 samples)
    • 1.40% java.lang.StringCoding.encode (17 samples)
    • 1.24% java.lang.System.identityHashCode (15 samples)
    • 1.07% java.io.FileOutputStream.open0 (13 samples)
    • 1.07% com.codename1.tools.translator.Parser.isMethodUsed (13 samples)
    • 1.07% org.objectweb.asm.ClassReader.readUTF8 (13 samples)
    • 0.99% java.util.HashMap.putVal (12 samples)
    • 0.99% com.codename1.tools.translator.BytecodeMethod.addInstruction (12 samples)
    • 0.99% sun.nio.ch.FileDispatcherImpl.write0 (12 samples)
  • ⚠️ Coverage report not generated.

Static Analysis

  • ✅ SpotBugs: no findings (report was not generated by the build).
  • ⚠️ PMD report not generated.
  • ⚠️ Checkstyle report not generated.

Generated automatically by the PR CI workflow.

@github-actions

Copy link
Copy Markdown
Contributor

Cloudflare Preview

@shai-almog

shai-almog commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 125 screenshots: 125 matched.
Native Windows port, REAL shipping pipeline: the hellocodenameone screenshot suite rendered by a binary CROSS-COMPILED on Linux (clang-cl + xwin, WebView2 linked) and RUN on a Windows x64 runner. Compared against the in-repo baseline in scripts/windows/screenshots.

Benchmark Results

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 70ms / native 3ms = 23.3x speedup
SIMD float-mul (64K x300) java 66ms / native 4ms = 16.5x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 native bridge unavailable (CN1 + SIMD + image benchmarks only)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path gated to scalar (CPU autovectorizes scalar; explicit SIMD not beneficial here)
Base64 CN1 encode 365.000 ms
Base64 CN1 decode 217.000 ms
Base64 SIMD encode 150.000 ms
Base64 encode ratio (SIMD/CN1) 0.411x (58.9% faster)
Base64 SIMD decode 128.000 ms
Base64 decode ratio (SIMD/CN1) 0.590x (41.0% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 33.000 ms
Image createMask (SIMD on) 15.000 ms
Image createMask ratio (SIMD on/off) 0.455x (54.5% faster)
Image applyMask (SIMD off) 57.000 ms
Image applyMask (SIMD on) 26.000 ms
Image applyMask ratio (SIMD on/off) 0.456x (54.4% faster)
Image modifyAlpha (SIMD off) 58.000 ms
Image modifyAlpha (SIMD on) 26.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.448x (55.2% faster)
Image modifyAlpha removeColor (SIMD off) 70.000 ms
Image modifyAlpha removeColor (SIMD on) 27.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.386x (61.4% faster)

@shai-almog

shai-almog commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 125 screenshots: 125 matched.
Native Windows port (x64 / Intel-AMD): full hellocodenameone screenshot suite rendered offscreen with Direct2D/DirectWrite, plus the real benchmarks (base64 native/CN1/SIMD, image createMask/applyMask/modifyAlpha/PNG/JPEG, SSE2 SIMD kernels). Compared against the in-repo baseline in scripts/windows/screenshots.

Benchmark Results

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 73ms / native 4ms = 18.2x speedup
SIMD float-mul (64K x300) java 71ms / native 4ms = 17.7x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 native bridge unavailable (CN1 + SIMD + image benchmarks only)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path gated to scalar (CPU autovectorizes scalar; explicit SIMD not beneficial here)
Base64 CN1 encode 289.000 ms
Base64 CN1 decode 172.000 ms
Base64 SIMD encode 126.000 ms
Base64 encode ratio (SIMD/CN1) 0.436x (56.4% faster)
Base64 SIMD decode 128.000 ms
Base64 decode ratio (SIMD/CN1) 0.744x (25.6% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 35.000 ms
Image createMask (SIMD on) 14.000 ms
Image createMask ratio (SIMD on/off) 0.400x (60.0% faster)
Image applyMask (SIMD off) 56.000 ms
Image applyMask (SIMD on) 30.000 ms
Image applyMask ratio (SIMD on/off) 0.536x (46.4% faster)
Image modifyAlpha (SIMD off) 58.000 ms
Image modifyAlpha (SIMD on) 22.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.379x (62.1% faster)
Image modifyAlpha removeColor (SIMD off) 62.000 ms
Image modifyAlpha removeColor (SIMD on) 23.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.371x (62.9% faster)

@shai-almog

shai-almog commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 124 screenshots: 124 matched.
✅ Native iOS screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 357 seconds

Build and Run Timing

Metric Duration
Simulator Boot 108000 ms
Simulator Boot (Run) 1000 ms
App Install 13000 ms
App Launch 7000 ms
Test Execution 308000 ms

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 145ms / native 6ms = 24.1x speedup
SIMD float-mul (64K x300) java 90ms / native 3ms = 30.0x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path active (NEON-accelerated)
Base64 CN1 encode 377.000 ms
Base64 CN1 decode 279.000 ms
Base64 native encode 1237.000 ms
Base64 encode ratio (CN1/native) 0.305x (69.5% faster)
Base64 native decode 452.000 ms
Base64 decode ratio (CN1/native) 0.617x (38.3% faster)
Base64 SIMD encode 136.000 ms
Base64 encode ratio (SIMD/CN1) 0.361x (63.9% faster)
Base64 SIMD decode 69.000 ms
Base64 decode ratio (SIMD/CN1) 0.247x (75.3% faster)
Base64 encode ratio (SIMD/native) 0.110x (89.0% faster)
Base64 decode ratio (SIMD/native) 0.153x (84.7% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 34.000 ms
Image createMask (SIMD on) 2.000 ms
Image createMask ratio (SIMD on/off) 0.059x (94.1% faster)
Image applyMask (SIMD off) 138.000 ms
Image applyMask (SIMD on) 80.000 ms
Image applyMask ratio (SIMD on/off) 0.580x (42.0% faster)
Image modifyAlpha (SIMD off) 159.000 ms
Image modifyAlpha (SIMD on) 82.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.516x (48.4% faster)
Image modifyAlpha removeColor (SIMD off) 122.000 ms
Image modifyAlpha removeColor (SIMD on) 71.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.582x (41.8% faster)

@shai-almog

shai-almog commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 128 screenshots: 128 matched.
✅ Native Mac screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 176 seconds

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 71ms / native 10ms = 7.1x speedup
SIMD float-mul (64K x300) java 55ms / native 3ms = 18.3x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path active (NEON-accelerated)
Base64 CN1 encode 277.000 ms
Base64 CN1 decode 198.000 ms
Base64 native encode 662.000 ms
Base64 encode ratio (CN1/native) 0.418x (58.2% faster)
Base64 native decode 368.000 ms
Base64 decode ratio (CN1/native) 0.538x (46.2% faster)
Base64 SIMD encode 54.000 ms
Base64 encode ratio (SIMD/CN1) 0.195x (80.5% faster)
Base64 SIMD decode 47.000 ms
Base64 decode ratio (SIMD/CN1) 0.237x (76.3% faster)
Base64 encode ratio (SIMD/native) 0.082x (91.8% faster)
Base64 decode ratio (SIMD/native) 0.128x (87.2% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 18.000 ms
Image createMask (SIMD on) 3.000 ms
Image createMask ratio (SIMD on/off) 0.167x (83.3% faster)
Image applyMask (SIMD off) 81.000 ms
Image applyMask (SIMD on) 57.000 ms
Image applyMask ratio (SIMD on/off) 0.704x (29.6% faster)
Image modifyAlpha (SIMD off) 69.000 ms
Image modifyAlpha (SIMD on) 52.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.754x (24.6% faster)
Image modifyAlpha removeColor (SIMD off) 79.000 ms
Image modifyAlpha removeColor (SIMD on) 58.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.734x (26.6% faster)

The cull's other O(N^2): getClassObject, getClassByName and
ByteCodeClass.findClass each did an O(N) linear scan of the whole class list,
and they're called per dependency per class during markDependencies /
updateAllDependencies, which run up to ~5 times per build. That's the dominant
cost once the native-scan (previous commit) is removed.

Replace the three scans with a shared name -> class HashMap, rebuilt lazily when
`classes` changes (tracked by the (reference, size) pair, which is sufficient
because `classes` is only ever reassigned, grown via add(), or cleared -- never
mutated to a same-ref/same-size/different-content state). First-match semantics
are preserved (first-wins on duplicate class names).

SpotBugs: the two lazily-built static index caches in Parser (this one and the
native-symbol index) are excluded for LI_LAZY_INIT_STATIC /
LI_LAZY_INIT_UPDATE_STATIC -- the translator runs single-threaded across one
translation run, the same rationale as the existing exclusions in that file.

Measured on the same harness (5476 classes fixed, native corpus scaled), with
the native-symbol index from the previous commit in place:

  native files   cull (before both)   cull (after both)   methods removed
  62             15s                  7s                  6968 / 6968
  462            24s                  7s                  6968 / 6968
  1362           48s                  7s                  6968 / 6968

Cull time is now flat in both class count and native-file count, removing the
exact same 6968 methods.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shai-almog shai-almog changed the title ParparVM: index native sources for O(1) symbol lookup in the dead-code cull ParparVM: index native-symbol and class lookups in the dead-code cull (O(N^2) -> ~O(N)) Jun 12, 2026
@shai-almog

shai-almog commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 128 screenshots: 128 matched.
✅ Native iOS Metal screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 331 seconds

Build and Run Timing

Metric Duration
Simulator Boot 72000 ms
Simulator Boot (Run) 1000 ms
App Install 22000 ms
App Launch 4000 ms
Test Execution 277000 ms

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 142ms / native 5ms = 28.4x speedup
SIMD float-mul (64K x300) java 236ms / native 4ms = 59.0x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path active (NEON-accelerated)
Base64 CN1 encode 691.000 ms
Base64 CN1 decode 430.000 ms
Base64 native encode 1525.000 ms
Base64 encode ratio (CN1/native) 0.453x (54.7% faster)
Base64 native decode 550.000 ms
Base64 decode ratio (CN1/native) 0.782x (21.8% faster)
Base64 SIMD encode 141.000 ms
Base64 encode ratio (SIMD/CN1) 0.204x (79.6% faster)
Base64 SIMD decode 62.000 ms
Base64 decode ratio (SIMD/CN1) 0.144x (85.6% faster)
Base64 encode ratio (SIMD/native) 0.092x (90.8% faster)
Base64 decode ratio (SIMD/native) 0.113x (88.7% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 30.000 ms
Image createMask (SIMD on) 3.000 ms
Image createMask ratio (SIMD on/off) 0.100x (90.0% faster)
Image applyMask (SIMD off) 93.000 ms
Image applyMask (SIMD on) 73.000 ms
Image applyMask ratio (SIMD on/off) 0.785x (21.5% faster)
Image modifyAlpha (SIMD off) 85.000 ms
Image modifyAlpha (SIMD on) 52.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.612x (38.8% faster)
Image modifyAlpha removeColor (SIMD off) 109.000 ms
Image modifyAlpha removeColor (SIMD on) 56.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.514x (48.6% faster)

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

✅ Continuous Quality Report

Test & Coverage

Static Analysis

  • SpotBugs [Report archive]
    • ByteCodeTranslator: 0 findings (no issues)
    • android: 0 findings (no issues)
    • codenameone-maven-plugin: 0 findings (no issues)
    • core-unittests: 0 findings (no issues)
    • ios: 0 findings (no issues)
  • PMD: 0 findings (no issues) [Report archive]
  • Checkstyle: 0 findings (no issues) [Report archive]

Generated automatically by the PR CI workflow.

cleanup() clears `classes` in place (same List reference), so the name index's
(reference, size) staleness guard cannot detect a subsequent same-size refill
when multiple translation runs share one JVM -- e.g. ParserTest, which calls
Parser.cleanup() in @beforeeach and then parses a single class per test, so
every test after the first saw (same ref, size 1) and got the previous test's
cached index. The single-run translator never hits this. Null the index in
cleanup() so each run rebuilds from its own classes.

Verified: ParserTest 10/10 green locally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shai-almog

shai-almog commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 125 screenshots: 125 matched.
Native Windows port (arm64 / Apple Silicon - Arm): full hellocodenameone screenshot suite rendered offscreen with Direct2D/DirectWrite, plus the real benchmarks (base64 native/CN1/SIMD, image createMask/applyMask/modifyAlpha/PNG/JPEG, NEON SIMD kernels). Compared against the in-repo baseline in scripts/windows/screenshots.

Benchmark Results

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 61ms / native 3ms = 20.3x speedup
SIMD float-mul (64K x300) java 60ms / native 3ms = 20.0x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 native bridge unavailable (CN1 + SIMD + image benchmarks only)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path gated to scalar (CPU autovectorizes scalar; explicit SIMD not beneficial here)
Base64 CN1 encode 599.000 ms
Base64 CN1 decode 234.000 ms
Base64 SIMD encode 104.000 ms
Base64 encode ratio (SIMD/CN1) 0.174x (82.6% faster)
Base64 SIMD decode 131.000 ms
Base64 decode ratio (SIMD/CN1) 0.560x (44.0% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 24.000 ms
Image createMask (SIMD on) 7.000 ms
Image createMask ratio (SIMD on/off) 0.292x (70.8% faster)
Image applyMask (SIMD off) 38.000 ms
Image applyMask (SIMD on) 12.000 ms
Image applyMask ratio (SIMD on/off) 0.316x (68.4% faster)
Image modifyAlpha (SIMD off) 36.000 ms
Image modifyAlpha (SIMD on) 11.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.306x (69.4% faster)
Image modifyAlpha removeColor (SIMD off) 39.000 ms
Image modifyAlpha removeColor (SIMD on) 11.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.282x (71.8% faster)

@shai-almog

shai-almog commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 121 screenshots: 121 matched.
✅ JavaScript-port screenshot tests passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant