Skip to content

[WIP] Fix Iceberg REST catalog commit safety — corruption under post-commit network failure#1676

Open
il9ue wants to merge 1 commit intoantalya-26.1from
fix/iceberg-truncate-cleanup-on-commit-failure
Open

[WIP] Fix Iceberg REST catalog commit safety — corruption under post-commit network failure#1676
il9ue wants to merge 1 commit intoantalya-26.1from
fix/iceberg-truncate-cleanup-on-commit-failure

Conversation

@il9ue
Copy link
Copy Markdown

@il9ue il9ue commented Apr 22, 2026

Summary

Status: WIP / Draft — awaiting lead approval on scope. Do not merge.

Original ticket #1609 (storage leak cleanup on TRUNCATE commit failure)
investigation revealed a latent table-corruption bug in the REST catalog
client that the #1609 fix would propagate into TRUNCATE if written as
originally scoped.

Under a post-commit network failure, the REST catalog HTTP client retries
the commit POST, receives 409 Conflict (because the server already
advanced the snapshot), and updateMetadata collapses the 409 to
return false. Callers interpret that as a clean pre-commit rejection
and run their cleanup path — deleting files the catalog now references.
Result: table corruption.

This affects Mutations.cpp in merged code today and would affect
TRUNCATE if #1609 were implemented as originally described. Repurposing
this branch to fix the root cause across three layers, with #1609
becoming the final layer on top.

Scope

  1. HTTP client (src/IO/ReadWriteBufferFromHTTP.cpp) — classify 409
    as non-retriable.
  2. REST catalog (src/Databases/DataLake/RestCatalog.cpp) — replace
    bool updateMetadata with a typed result distinguishing Committed /
    RejectedCleanly / Unknown.
  3. Callers (Mutations.cpp, IcebergMetadata::truncate) — gate
    cleanup on RejectedCleanly only; on Unknown, log and preserve
    state.

Evidence

Every claim cited to file:line in design-notes.md on this branch.
Full investigation transcripts attached to escalation.

  • src/IO/ReadWriteBufferFromHTTP.cpp:22-34 — 409 is retriable
  • src/IO/ReadWriteBufferFromHTTP.cpp:519-522impl.reset() forces
    body re-send on every retry
  • src/Databases/DataLake/RestCatalog.cpp:1347-1355 — single catch,
    no status inspection
  • src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp:522,530
    — unconditional cleanup call sites

Violates Iceberg REST spec §3.5 (409 should be non-retriable; requires
metadata refresh, not retry).

Related

See design-notes.md for full investigation and layered fix proposal.

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixed a data loss bug in the Iceberg REST catalog client where a network
failure after a successful updateMetadata commit could cause the client
to delete object-storage files that the catalog now references, leaving
the table in a corrupted state. The HTTP client now classifies 409
Conflict as non-retriable, updateMetadata returns a typed result that
distinguishes clean rejection from ambiguous outcomes, and callers
(Iceberg mutations and TRUNCATE) only clean up written files when the
commit is known to have been cleanly rejected. Also adds missing cleanup
of orphaned files in the TRUNCATE path (#1609).

Documentation entry for user-facing changes

No documentation changes required. This fix does not change the public
SQL surface or configuration. Internal HTTP retry behavior for 409
responses changes: REST catalog calls receiving 409 will no longer retry
and will surface the conflict to the caller on the first occurrence,
which is the correct Iceberg REST spec behavior.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Original ticket #1609 (storage leak cleanup on TRUNCATE commit failure)
investigation revealed a latent table-corruption bug in the REST catalog
client that the #1609 fix would propagate into TRUNCATE if written as
originally scoped. Repurposing this branch to fix the root cause, with
#1609 becoming the final layer on top.

See design-notes.md for full investigation and layered fix proposal.
Awaiting lead approval on scope expansion before implementation.
@github-actions
Copy link
Copy Markdown

Workflow [PR], commit [3bb0e62]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant