branch-4.1: (cloud) Hold table write lock across first-time dynamic partition setup to prevent CREATE MV race #62755 by github-actions[bot] · Pull Request #62863 · apache/doris

github-actions · 2026-04-27T07:58:07Z

Cherry-picked from #62755

…up to prevent CREATE MV race (#62755) In cloud mode, `InternalCatalog.createTable` releases `db.writeLock` right after writing the `OP_CREATE_TABLE` edit log, and only then invokes `DynamicPartitionScheduler.executeDynamicPartitionFirstTime` to create the first batch of dynamic partitions for the new table. There is no lock guarding the gap between these two steps, which opens a race window: Thread A (CREATE TABLE with dynamic_partition) -> db.writeLock -> db.createTableWithoutLock() # writes OP_CREATE_TABLE (idToPartition is empty) -> db.writeUnlock # <- race window opens here -> executeDynamicPartitionFirstTime -> for each partition: addPartition() -> olapTable.readLock -> checkNormalStateForAlter() Thread B (CREATE MATERIALIZED VIEW from another client, concurrent) -> olapTable.writeLockOrDdlException # succeeds, A's db lock does not block this -> checkNormalStateForAlter # passes, state is still NORMAL -> for (Partition p : olapTable.getPartitions()) # snapshots leader's half-built state mvJob.addMVIndex(partitionId, ...) -> olapTable.setState(ROLLUP) -> logAlterJob(OP_ALTER_JOB_V2) # journals a rollup that references partitions # which have not (and never will) appear as # OP_ADD_PARTITION entries in the journal Thread A resumes on the next addPartition -> checkNormalStateForAlter throws "state(ROLLUP) not NORMAL" -> CREATE TABLE returns ERR, but OP_CREATE_TABLE and OP_ALTER_JOB_V2 are already durably on disk, leaving a permanent inconsistency in the journal. In cloud mode, have two clients fire the following two statements against the same table within the same second: CREATE TABLE IF NOT EXISTS t ( ... ) PARTITION BY RANGE(ts) () PROPERTIES ( "dynamic_partition.enable" = "true", "dynamic_partition.time_unit" = "DAY", "dynamic_partition.start" = "-7", "dynamic_partition.end" = "1", "dynamic_partition.create_history_partition" = "true" ); CREATE MATERIALIZED VIEW mv AS SELECT ... FROM t GROUP BY ...; The new regression test `test_create_table_and_create_mv_race.groovy` uses the debug point `FE.createOlapTable.beforeFirstTimeDynamicPartition` (param `sleepMs`) to widen the race window and reproduce it deterministically. Once the bad journal entry is persisted, any FE replaying it hits: NullPointerException: Cannot invoke DataProperty.getStorageMedium() because the return value of PartitionInfo.getDataProperty(long) is null at RollupJobV2.addTabletToInvertedIndex(RollupJobV2.java:762) at RollupJobV2.replayCreateJob(RollupJobV2.java:745) at EditLog.loadJournal(EditLog.java:939) `EditLog.loadJournal:1448` calls `System.exit(-1)`, so the FE JVM exits immediately. Consequences observed in production: - All followers that replicate the bad entry crash on replay; supervisor restarts them and they crash again on the same entry, entering a Extend the lifetime of `olapTable.writeLock` inside `InternalCatalog.createTable`: 1. After `OP_CREATE_TABLE` has been written (i.e. `result.second == false`, the table was newly registered), acquire `olapTable.writeLock()` before releasing `db.writeLock`. 2. Wrap everything that used to run after `db.writeUnlock` (colocate persist, `executeDynamicPartitionFirstTime`, `registerOrRemoveDynamicPartitionTable`, `createOrUpdateRuntimeInfo`) in a new try/finally and release the table lock in the finally according to the `holdTableLock` flag. With this, Thread A holds the table write lock across the whole first-time dynamic partition setup. Any concurrent CREATE MV / SCHEMA CHANGE blocks on `olapTable.writeLockOrDdlException` until A releases the lock, at which point `olapTable.getPartitions()` reflects the full partition set and the rollup job B constructs only references partitions that have matching `OP_ADD_PARTITION` entries in the journal. The inconsistency is gone. The lock is scoped to this one new table, so other tables in the same database are unaffected. The new table has no user traffic yet, so the extra lock hold time is effectively free. The fix also introduces the debug point `FE.createOlapTable.beforeFirstTimeDynamicPartition` (param `sleepMs`), used only by the regression test to widen the race window. It is disabled by default in production. - Regression: `regression-test/suites/cloud_p0/partition/test_create_table_and_create_mv_race.groovy` runs CREATE TABLE and CREATE MV concurrently and asserts that MV completes no earlier than CREATE TABLE. Without the fix, MV either returns during the injected sleep (assertion fails) or CREATE TABLE throws a ROLLUP-state error (future.get() re-throws before the assertion). With the fix, MV blocks on the table lock and the test passes. ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test  - [x] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason  - Behavior changed: - [x] No. - [ ] Yes.  - Does this need documentation? - [x] No. - [ ] Yes.  ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label

Thearas · 2026-04-27T07:58:14Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Thearas · 2026-04-27T07:58:18Z

run buildall

hello-stephen · 2026-04-27T10:47:08Z

FE Regression Coverage Report

Increment line coverage 43.18% (19/44) 🎉
Increment coverage report
Complete coverage report

github-actions Bot requested a review from yiguolei as a code owner April 27, 2026 07:58

dataroaring closed this Apr 27, 2026

dataroaring reopened this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

branch-4.1: (cloud) Hold table write lock across first-time dynamic partition setup to prevent CREATE MV race #62755#62863

branch-4.1: (cloud) Hold table write lock across first-time dynamic partition setup to prevent CREATE MV race #62755#62863
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-62755-branch-4.1

github-actions Bot commented Apr 27, 2026

Uh oh!

Thearas commented Apr 27, 2026

Uh oh!

Thearas commented Apr 27, 2026

Uh oh!

hello-stephen commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

github-actions Bot commented Apr 27, 2026

Uh oh!

Thearas commented Apr 27, 2026

Uh oh!

Thearas commented Apr 27, 2026

Uh oh!

hello-stephen commented Apr 27, 2026

FE Regression Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants