Optimize dictionary writers by replacing fastutil Linked maps with OpenHashMap + ArrayList

## Background

The dictionary writers in `DictionaryValuesWriter` (binary, long, double, float, int variants) use fastutil's `*2IntLinkedOpenHashMap` to assign dictionary indices to distinct values. The "Linked" variant is used because the dictionary page must be emitted in **insertion order** (dictionary index `i` corresponds to the `i`-th distinct value seen), and the linked map preserves that order via a doubly-linked list threaded through the slot array.

This is correct, but the linked variant pays an avoidable cost on every insertion:

- Two extra `long` fields per slot (`prev`, `next` indices) → larger slots, more cache lines per probe
- Pointer fix-up on insert: update the previous tail's `next`, the new entry's `prev`, and the `lastEntry` pointer (3–4 scattered writes per insert)
- Re-stitching the linked list on rehash
- Pure pointer chasing — not vectorizable, not branch-friendly

For high-cardinality columns (hundreds of thousands of distinct values per column chunk), this overhead compounds on a hot path.

## Proposal

Replace each `*2IntLinkedOpenHashMap` with the plain `*2IntOpenHashMap` plus a separate primitive-typed list (`IntArrayList` / `LongArrayList` / `FloatArrayList` / `DoubleArrayList` / `ArrayList<Binary>`) to track insertion order:

- The hash map becomes a pure "have I seen this? what's its id?" lookup with the smallest possible slot.
- The list is append-only, contiguous, and cache-friendly to iterate at flush time.

The two responsibilities (lookup vs ordering) were jammed into one structure; splitting them lets each be optimal. The trade-off is one extra primitive entry per distinct value, which is small compared to the per-slot overhead the linked map was paying.

Note: both candidates are fastutil primitive-keyed maps, so this is **not** a boxing change. The win is structural — eliminating an ordering guarantee that was being paid for on every insert when an explicit append-only list provides it more cheaply.

## Expected impact

From local benchmark runs (`BinaryEncodingBenchmark.encodeDictionary`, `IntEncodingBenchmark.encodeDictionary` — being added in #3512):

- `BinaryEncodingBenchmark.encodeDictionary` (high cardinality, short strings): **+23% to +42%** throughput
- `IntEncodingBenchmark.encodeDictionary` (high cardinality): **~+2x** in some configurations
- Low-cardinality cases are flat or trivially different (the linked-list overhead doesn't matter when there are few inserts)

## Files affected

- `parquet-column/src/main/java/org/apache/parquet/column/values/dictionary/DictionaryValuesWriter.java` — the 5 inner classes `PlainBinaryDictionaryValuesWriter`, `PlainLongDictionaryValuesWriter`, `PlainDoubleDictionaryValuesWriter`, `PlainFloatDictionaryValuesWriter`, `PlainIntegerDictionaryValuesWriter`.

No public API change. No file format change. Behavior is identical (dictionary pages emit values in the same order).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize dictionary writers by replacing fastutil Linked maps with OpenHashMap + ArrayList #3513

Background

Proposal

Expected impact

Files affected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Optimize dictionary writers by replacing fastutil Linked maps with OpenHashMap + ArrayList #3513

Description

Background

Proposal

Expected impact

Files affected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions