fix(fe): clean DynamicPartitionScheduler.runtimeInfos on DROP TABLE by horus-leonardo · Pull Request #62884 · apache/doris

horus-leonardo · 2026-04-27T17:09:29Z

What problem does this PR solve?

Issue Number: close #62883

Related PR: none

Problem Summary:

DynamicPartitionScheduler.runtimeInfos accumulates entries indefinitely. The map is keyed by tableId and gets a new entry every time the scheduler runs against a table with dynamic_partition.enable=true or partitionRetentionCount > 0.

removeRuntimeInfo(long tableId) is called in exactly one place: ShowDynamicPartitionCommand.doRun(), which only fires when a user issues SHOW DYNAMIC PARTITION and only for tables still present in the catalog that have lost their dynamic_partition property. No catalog mutation path calls it — DROP TABLE, DROP DATABASE, and tables that turn off dynamic_partition or zero out partitionRetentionCount all leave permanent entries. In automated ETL workloads where nobody runs SHOW, the map grows unbounded.

This patch wires removeRuntimeInfo() into the three canonical cleanup points:

InternalCatalog.unprotectDropTable() — alongside db.unregisterTable().
executeDynamicPartition() db == null branch — after iterator.remove().
executeDynamicPartition() olapTable invalid/lost-properties branch — after iterator.remove().

Found via heap dump analysis after an FE OOM on 4.0.5-rc01 today (2026-04-27) in a high-DDL-churn ETL workload. The map had reached ~1.5M entries / 554 MB retained heap. We are rolling out a patched build to production now and will follow up on the issue thread with steady-state retention numbers after a week of uptime.

Full bug report and heap dump details in #62883.

Release note

Fix FE memory leak in DynamicPartitionScheduler.runtimeInfos for tables that are dropped, lose their dynamic_partition.enable property, or have partitionRetentionCount reset to 0.

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:

Manual test: heap dump analysis on a 4.0.5-rc01 FE that OOMed under an ETL workload doing ~24K DDL/hour against dynamic_partition tables. The dump showed runtimeInfos holding ~1M–1.5M stale entries (2,097,152-bucket ConcurrentHashMap$Node[], 554 MB retained on DynamicPartitionScheduler, 17% of live heap post-GC walk). The patched build is being deployed today; I will report steady-state heap numbers in the issue thread after a week of production uptime.

A unit test reproducing the leak would need to drive the dynamic-partition scheduler against a synthetic catalog and assert runtimeInfos.size() after DROP. Happy to add one if maintainers prefer that over the production validation.

Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

DynamicPartitionScheduler.runtimeInfos accumulates entries indefinitely when tables are dropped or lose their dynamic_partition properties. removeRuntimeInfo(tableId) is called from ShowDynamicPartitionCommand but only opportunistically: it requires a user to issue SHOW DYNAMIC PARTITION and only catches tables still present in the catalog that have lost their dynamic_partition property. No catalog mutation path calls it. Fix: - Call removeRuntimeInfo() in InternalCatalog.unprotectDropTable() so the entry is cleared when a table is dropped. - Call removeRuntimeInfo() in executeDynamicPartition() at the two cleanup points where the iterator removes a table from the scheduling set (db gone, olapTable null/MTMV/no-dynamic-partition). In a high-DDL-churn workload (CREATE/DROP loops on tables with dynamic_partition.enable=true or partitionRetentionCount > 0) this map can grow unbounded and cause FE OOM after extended uptime. Closes apache#62883 Signed-off-by: Leonardo Constanski <leonardo@horusbi.com.br>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fe): clean DynamicPartitionScheduler.runtimeInfos on DROP TABLE#62884

fix(fe): clean DynamicPartitionScheduler.runtimeInfos on DROP TABLE#62884
horus-leonardo wants to merge 1 commit intoapache:masterfrom
horus-leonardo:fix/dynamic-partition-scheduler-runtimeinfos-leak

horus-leonardo commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

horus-leonardo commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

horus-leonardo commented Apr 27, 2026 •

edited

Loading