Search before asking
Paimon version
1.4
Compute Engine
Flink (Paimon Sink). Affects any engine using Paimon's IcebergCommitCallback (StarRocks in our case)
Minimal reproduce step
- Create an append-only Paimon table with the Iceberg metadata committer enabled (e.g.
'metadata.iceberg.storage' = 'rest-catalog' pointed at a REST catalog such as Polaris, or hadoop-catalog).
- Stream data in so that Paimon's LSM engine performs its normal level compaction (any long-running streaming ingest will do this within minutes).
- After a compaction happens, read the Iceberg metadata for the resulting snapshot:
gcloud storage cat gs://<warehouse>/<db>/<table>/metadata/v<N>.metadata.json \
| jq '.snapshots[-5:] | .[] | {id:."snapshot-id", op:.summary.operation, added:.summary["added-records"], deleted:.summary["deleted-records"]}'
- Observe that the compaction snapshot is labeled
"operation": "overwrite" even though no logical rows were added or deleted (added-records == 0, deleted-records == 0; only files were reorganized).
What doesn't meet your expectations?
Per the Iceberg spec, the four snapshot operation values have distinct semantics:
| operation |
Meaning |
append |
Only new data files added. |
replace |
Files added and removed without changing table data (compaction, format change, relocation). |
overwrite |
Files added and removed and table data may have changed (INSERT OVERWRITE, MERGE, row-level deletes). |
delete |
Only files removed. |
Paimon's own LSM compaction is by definition a pure file rewrite with no logical row change — this is exactly what Iceberg's replace operation is for. Native Iceberg writers (RewriteFiles, RewriteManifests) use DataOperations.REPLACE for this case, and all of Iceberg's incremental scan APIs (IncrementalAppendScan, IncrementalChangelogScan, Spark MicroBatchStream, Flink MonitorSource) treat replace as a no-op for incremental reads.
Paimon currently emits overwrite for these compaction snapshots, which is indistinguishable — from a downstream reader's point of view — from a genuine row-changing overwrite. This breaks any downstream consumer that relies on the spec's distinction.
Anything else?
Root cause
IcebergSnapshotSummary only defines two constants, and there is no code path in Paimon that produces "replace":
// paimon-core/src/main/java/org/apache/paimon/iceberg/metadata/IcebergSnapshotSummary.java
public static final IcebergSnapshotSummary APPEND = new IcebergSnapshotSummary("append");
public static final IcebergSnapshotSummary OVERWRITE = new IcebergSnapshotSummary("overwrite");
IcebergCommitCallback runs after every Paimon commit (both CommitKind.APPEND and CommitKind.COMPACT). It does not inspect the Paimon CommitKind; it just diffs files and falls back to OVERWRITE any time a previously-manifested file was removed:
// paimon-core/src/main/java/org/apache/paimon/iceberg/IcebergCommitCallback.java
// (createWithDeleteManifestFileMetas)
} else {
// some file is removed, rewrite this file meta
snapshotSummary = IcebergSnapshotSummary.OVERWRITE;
...
}
Compaction — which always removes the old L0/L1/... files and adds the merged result — therefore deterministically lands as overwrite rather than replace.
Downstream impact [Starrocks Example]
StarRocks IVM (Incremental Materialized View) refresh on a Paimon-produced Iceberg table fails on every compaction snapshot with:
com.starrocks.sql.analyzer.SemanticException: Getting analyzing error.
Detail message: TvrTableDeltaTrait is not append-only for base table: <db>.<table>,
delta:DeltaTrait{delta=Delta@[<snap>,<snap>], changeType=RETRACTABLE,
stats=Stats{addedRows=0, addedFileSize=0}}.
StarRocks recently fixed this for native-Iceberg tables in StarRocks#69825, which skips replace snapshots in IcebergMetadata.listTableDeltaTraits(). That fix does not apply to Paimon-written Iceberg tables because Paimon never emits replace. The StarRocks PR author explicitly scoped the fix to Iceberg and noted that Paimon would need a separate change, so the cleanest place for it is upstream in Paimon, where the Iceberg semantics can be made to match the spec.
Related context:
Anything else?
Proposed fix
- Add a
REPLACE constant to IcebergSnapshotSummary:
public static final IcebergSnapshotSummary REPLACE = new IcebergSnapshotSummary("replace");
- In
IcebergCommitCallback, thread the Paimon CommitKind (or the logical "rows unchanged" signal) through to the summary decision. When the underlying Paimon commit is CommitKind.COMPACT — or, equivalently, when the file-level diff adds/removes files but contributes zero net rows — emit REPLACE instead of OVERWRITE.
- Keep
OVERWRITE for genuine row-changing operations (INSERT OVERWRITE, merge-on-read deletes that actually drop logical rows, etc.).
This aligns Paimon's Iceberg-compat metadata with the Iceberg spec and lets downstream incremental readers (StarRocks IVM, Spark structured streaming incremental scans, Flink Iceberg source, etc.) correctly treat Paimon compaction as a no-op for incremental refresh.
Happy to send a PR if a maintainer can confirm the proposed shape (new enum constant + CommitKind-based branch) is acceptable.
Are you willing to submit a PR?
Search before asking
Paimon version
1.4
Compute Engine
Flink (Paimon Sink). Affects any engine using Paimon's IcebergCommitCallback (StarRocks in our case)
Minimal reproduce step
'metadata.iceberg.storage' = 'rest-catalog'pointed at a REST catalog such as Polaris, orhadoop-catalog)."operation": "overwrite"even though no logical rows were added or deleted (added-records == 0,deleted-records == 0; only files were reorganized).What doesn't meet your expectations?
Per the Iceberg spec, the four snapshot operation values have distinct semantics:
appendreplaceoverwriteINSERT OVERWRITE,MERGE, row-level deletes).deletePaimon's own LSM compaction is by definition a pure file rewrite with no logical row change — this is exactly what Iceberg's
replaceoperation is for. Native Iceberg writers (RewriteFiles,RewriteManifests) useDataOperations.REPLACEfor this case, and all of Iceberg's incremental scan APIs (IncrementalAppendScan,IncrementalChangelogScan, SparkMicroBatchStream, FlinkMonitorSource) treatreplaceas a no-op for incremental reads.Paimon currently emits
overwritefor these compaction snapshots, which is indistinguishable — from a downstream reader's point of view — from a genuine row-changing overwrite. This breaks any downstream consumer that relies on the spec's distinction.Anything else?
Root cause
IcebergSnapshotSummaryonly defines two constants, and there is no code path in Paimon that produces"replace":IcebergCommitCallbackruns after every Paimon commit (bothCommitKind.APPENDandCommitKind.COMPACT). It does not inspect the PaimonCommitKind; it just diffs files and falls back toOVERWRITEany time a previously-manifested file was removed:Compaction — which always removes the old L0/L1/... files and adds the merged result — therefore deterministically lands as
overwriterather thanreplace.Downstream impact [Starrocks Example]
StarRocks IVM (Incremental Materialized View) refresh on a Paimon-produced Iceberg table fails on every compaction snapshot with:
StarRocks recently fixed this for native-Iceberg tables in StarRocks#69825, which skips
replacesnapshots inIcebergMetadata.listTableDeltaTraits(). That fix does not apply to Paimon-written Iceberg tables because Paimon never emitsreplace. The StarRocks PR author explicitly scoped the fix to Iceberg and noted that Paimon would need a separate change, so the cleanest place for it is upstream in Paimon, where the Iceberg semantics can be made to match the spec.Related context:
Anything else?
Proposed fix
REPLACEconstant toIcebergSnapshotSummary:IcebergCommitCallback, thread the PaimonCommitKind(or the logical "rows unchanged" signal) through to the summary decision. When the underlying Paimon commit isCommitKind.COMPACT— or, equivalently, when the file-level diff adds/removes files but contributes zero net rows — emitREPLACEinstead ofOVERWRITE.OVERWRITEfor genuine row-changing operations (INSERT OVERWRITE, merge-on-read deletes that actually drop logical rows, etc.).This aligns Paimon's Iceberg-compat metadata with the Iceberg spec and lets downstream incremental readers (StarRocks IVM, Spark structured streaming incremental scans, Flink Iceberg source, etc.) correctly treat Paimon compaction as a no-op for incremental refresh.
Happy to send a PR if a maintainer can confirm the proposed shape (new enum constant +
CommitKind-based branch) is acceptable.Are you willing to submit a PR?