Antalya 26.1 : Fix Alter table operations(add, modify, drop, rename) support for Iceberg#1594
Antalya 26.1 : Fix Alter table operations(add, modify, drop, rename) support for Iceberg#1594subkanthi wants to merge 15 commits intoantalya-26.1from
Conversation
|
…d columns to a separate function, added unit tests
|
|
||
| void assertInitializedDL() const | ||
| { | ||
| BaseStorageConfiguration::assertInitialized(); |
There was a problem hiding this comment.
This was removed in upstream
|
Testing: |
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f344d72b29
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Poco::JSON::Array::Ptr identifier_fields = new Poco::JSON::Array; | ||
| schema_for_rest->set("identifier-field-ids", identifier_fields); | ||
| } |
There was a problem hiding this comment.
Preserve identifier-field-ids when sending schema updates
Overwriting identifier-field-ids with an empty array discards any existing row identity configuration for tables created outside ClickHouse. In buildUpdateMetadataRequestBody, schema_for_rest is cloned from the new schema but then unconditionally reset, so an ALTER against a REST-catalog Iceberg table with non-empty identifier fields will silently clear them in the committed schema update. This can break downstream equality-delete/merge workflows that depend on those IDs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed, added logic to update identifier-field-ids only if its already set.
| if (!catalog->updateMetadata(namespace_name, table_name, catalog_filename, metadata)) | ||
| { | ||
| LOG_WARNING(log, "Iceberg alter: catalog update failed for '{}'", catalog_filename); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Do not retry by reapplying ALTER after failed catalog commit
After successfully writing the new metadata file, a catalog commit failure currently does continue, which re-enters the loop and applies the same alter operation again on top of already-updated metadata. For ADD COLUMN, this path can produce duplicate field entries (the add generator does not guard against existing names), and for other operations it can cause spurious failures because the first write already changed the schema version. This failure mode is triggered whenever updateMetadata returns false after a successful metadata write (e.g., transient REST error).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed. If the catalog updateMetadata fails throw an exception.
…, fixed retry logic in Mutations.cpp for catalog updateMetadata
|
|
||
| namespace | ||
| { | ||
| Poco::JSON::Object::Ptr cloneJsonObject(const Poco::JSON::Object::Ptr & obj) |
There was a problem hiding this comment.
wow, is this really the best way to clone it? I havent checked th rest of the code yet, but I am curious to understand why this is needed
There was a problem hiding this comment.
ok, I understand why you need it. I also see poco does not offer a deep copy APi, so this is ok
| return result; | ||
| } | ||
|
|
||
| Poco::JSON::Object::Ptr request_body = new Poco::JSON::Object; |
There was a problem hiding this comment.
I wonder how they (poco lib) manage the life cycle of such objects
There was a problem hiding this comment.
From poco docs Poco::JSON::Object::Ptr is a type alias for a Poco::SharedPtr managing a Poco::JSON::Object .
Its a reference counting smart pointer
| if (!schema_for_rest->has("identifier-field-ids")) | ||
| schema_for_rest->set("identifier-field-ids", new Poco::JSON::Array); | ||
|
|
||
| if (old_schema_id >= 0) |
There was a problem hiding this comment.
I assume this means: is this not the first schema?
There was a problem hiding this comment.
I honestly dont know he specs, so I cant really give an opinion on the below fields
There was a problem hiding this comment.
yes correct, the logic is const Int32 old_schema_id = new_schema_id - 1; if a current schema exists, then add it to the assert-current-schema-id in the request
{
"identifier": {
"namespace": ["sales", "events"],
"name": "orders"
},
"requirements": [
{ "type": "assert-table-uuid", "uuid": "<table-uuid-from-loadTable>" },
{ "type": "assert-current-schema-id", "current-schema-id": 3 }
],
"updates": [
{
"action": "rename-column",
"name": "old_name",
"new-name": "new_name"
}
]
}
| // "operation" : "replace", | ||
| // "summary" : "replace snapshot 1 with snapshot 2" | ||
| // } | ||
| if (new_snapshot->has("parent-snapshot-id")) |
There was a problem hiding this comment.
Again, I dont know the specs, but to my understand it is impossible that old_schema_id < 0 while parent_snapshot_id != -1. Please educate me
There was a problem hiding this comment.
The else part https://github.com/Altinity/ClickHouse/pull/1594/changes/BASE..4f37d22fc562d75bdad4f41d414125a966520964#diff-a70dee27f63a76254ba9d7e7f9dc632d5cdee3d19be91663d04686247582e6faL1308 is from pre-existing code and is not related to updating schemas, is used for insert/append snapshot. I just moved the code to a separate function as its both related to updating the REST API request.
There was a problem hiding this comment.
Nvm, I confused the if blocks
| request_body->set("updates", updates); | ||
| } | ||
| const auto built = buildUpdateMetadataRequestBody(namespace_name, table_name, new_snapshot); | ||
| if (built.status == UpdateMetadataRequestBodyResult::Status::Skip) |
There was a problem hiding this comment.
Is there a reason to follow this "err as return value" pattern? Usually we just throw exceptions with clear messages and those are propagated back to the user or logged somewhere
| metadata_object->getArray(Iceberg::f_schemas)->add(current_schema); | ||
| } | ||
|
|
||
| void MetadataGenerator::generateRenameColumnMetadata(const String & column_name, const String & new_column_name) |
There was a problem hiding this comment.
It doesnt look like it generates or outputs anything, I would personally name it differently like set or update
There was a problem hiding this comment.
This is a bugfix of the original upstream PR, kept the same name generateRenameColumnMetadata
void generateAddColumnMetadata(const String & column_name, DataTypePtr type);
void generateDropColumnMetadata(const String & column_name);
void generateModifyColumnMetadata(const String & column_name, DataTypePtr type);
void generateRenameColumnMetadata(const String & column_name, const String & new_column_name);
There was a problem hiding this comment.
This was the original upstream PR https://github.com/ClickHouse/ClickHouse/pull/97455/changes#diff-58d3a6dbeceacc359645fc8bcac193781a4c82ecccfb67a9362db280df353154R339, which implemented these functions, its probably better to keep the function names the same.
| throw Exception(ErrorCodes::BAD_ARGUMENTS, "Iceberg alter supports exactly one command at a time, got {}", params.size()); | ||
|
|
||
| size_t i = 0; | ||
| while (i++ < MAX_TRANSACTION_RETRIES) |
There was a problem hiding this comment.
I would honestly avoid such unrelated changes, unless we ship it to upstream and then backport
There was a problem hiding this comment.
this is a bug, because i is post-incremented and here it will start with 1.
if (i == MAX_TRANSACTION_RETRIES)
throw Exception(ErrorCodes::LIMIT_EXCEEDED, "Too many unsuccessed retries to alter iceberg table");
| return; | ||
| } | ||
|
|
||
| bool file_exists = object_storage->exists(StoredObject(storage_metadata_name)); |
There was a problem hiding this comment.
I dont understand why we need two branches, one in case we wrote the hint and one in case we did not
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f37d22fc5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (catalog->updateMetadata(namespace_name, table_name, catalog_filename, metadata)) | ||
| { | ||
| LOG_INFO(log, "Iceberg alter: adopted existing metadata file '{}' and updated catalog", storage_metadata_name); | ||
| return; |
There was a problem hiding this comment.
Do not treat foreign metadata file as successful ALTER
When writeMetadataFileAndVersionHint fails because the target file already exists, this branch returns success as soon as catalog->updateMetadata(...) succeeds. For Glue catalogs, GlueCatalog::updateMetadata (see src/Databases/DataLake/GlueCatalog.cpp) only repoints metadata_location and does not validate that the existing file contains this ALTER’s schema changes, so a concurrent writer’s file can be "adopted" and the user gets a successful ALTER that was never applied.
Useful? React with 👍 / 👎.
| if (!catalog->updateMetadata(namespace_name, table_name, catalog_filename, metadata)) | ||
| { | ||
| throw Exception( | ||
| ErrorCodes::DATALAKE_DATABASE_ERROR, | ||
| "Iceberg alter: catalog commit failed for '{}' after metadata file was written successfully", |
There was a problem hiding this comment.
Retry catalog commit failures inside ALTER retry loop
A failed catalog->updateMetadata(...) immediately throws and exits the transaction loop, so MAX_TRANSACTION_RETRIES is effectively skipped for catalog conflicts/transient failures after metadata write. In practice, concurrent schema updates (e.g. REST commit conflict responses) will fail on first conflict instead of retrying from fresh metadata, unlike the mutation path’s retry behavior.
Useful? React with 👍 / 👎.
…ter post-write branches Addresses review feedback on PR #1594: R1480: replace err-as-return-value pattern in buildUpdateMetadataRequestBody with standard exception/null semantics. - Return nullptr for the "no snapshot, nothing to commit" skip case. - Throw DB::Exception(DATALAKE_DATABASE_ERROR) with a specific message for malformed metadata (missing current-schema-id, no schema object matching current-schema-id). - RestCatalog::updateMetadata keeps its bool return for ICatalog compatibility but now logs the full HTTPException displayText on commit failure instead of silently swallowing it. - Drop UpdateMetadataRequestBodyResult struct; callers consume the Poco::JSON::Object::Ptr directly. - Update gtest_rest_catalog_update_metadata: use EXPECT_FALSE(body) for skip, EXPECT_THROW(..., DB::Exception) for malformed inputs, ASSERT_TRUE(body) for success, and assert on the requirements/updates payload shape. R768: collapse the two-branch post-write logic in Mutations.cpp alter retry loop into a single flow driven by wrote_ok and file_exists: - wrote_ok=true -> commit to catalog, fatal if catalog rejects. - wrote_ok=false && file_exists -> orphan-adopt: attempt catalog commit; on failure continue to next retry (do not throw). - wrote_ok=false && !file_exists -> retry with a new version. Adds a comment explaining why orphan-adopt is preferred over writing a new version. Made-with: Cursor
a773167 to
64783dc
Compare
… code. Signed-off-by: Kanthi Subramanian <subkanthi@gmail.com>
64783dc to
c29588a
Compare
arthurpassos
left a comment
There was a problem hiding this comment.
Left some comment. Otherwise lgtm
| compression_method, | ||
| data_lake_settings[DataLakeStorageSetting::iceberg_use_version_hint]); | ||
|
|
||
| // If the write helper reported failure but the metadata file nonetheless landed on |
There was a problem hiding this comment.
Are you sure this is a good idea? What if it indeed failed to write, but some other writer shipped a file with the same name? If the filenames contain some sort of UUID, then it is ok.
But I actually don't understand why you need to go this far. If you didn't get a response, ins't it easier to just fail the operation and retry?
| // "operation" : "replace", | ||
| // "summary" : "replace snapshot 1 with snapshot 2" | ||
| // } | ||
| if (new_snapshot->has("parent-snapshot-id")) |
There was a problem hiding this comment.
Nvm, I confused the if blocks
| catalog_filename); | ||
| continue; | ||
| } | ||
| if (!wrote_ok) |
There was a problem hiding this comment.
Should you even consider updating the catalog if you failed to write the metadata file?
|
|
||
| void StorageObjectStorage::alter(const AlterCommands & params, ContextPtr context, AlterLockHolder & /*alter_lock_holder*/) | ||
| { | ||
| configuration->update(object_storage, context, /* if_not_updated_before */ true); |
There was a problem hiding this comment.
Shouldn't it be like this?
if (update_configuration_on_read_write)
{
configuration->update(
object_storage,
local_context,
/* if_not_updated_before */ false);
}
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fixes Alter table ADD/DROP/RENAME/MODIFY column for Iceberg tables.
Documentation entry for user-facing changes
Solved ClickHouse#86024.
Fixed logic of updating metadata JSON as per Iceberg REST API Spec.
Added unit/integration tests.
CI/CD Options
Exclude tests:
Regression jobs to run: