Skip to content

HDDS-15065. Reduce Ratis snapshot gap to fix SCM flush delay.#10100

Open
priyeshkaratha wants to merge 2 commits intoapache:masterfrom
priyeshkaratha:HDDS-15065
Open

HDDS-15065. Reduce Ratis snapshot gap to fix SCM flush delay.#10100
priyeshkaratha wants to merge 2 commits intoapache:masterfrom
priyeshkaratha:HDDS-15065

Conversation

@priyeshkaratha
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This change addresses the SCM flush delay caused by the minimum transaction gap enforced in Apache Ratis.
Previously, even with the timer-based flush mechanism, DB updates were effectively delayed due to the default snapshot creation gap (~1024 transactions).

Changes:

  • Reduce ozone.scm.ha.ratis.server.snapshot.creation.gap from 1024 to 1
  • Remove this configuration from Ozone

Impact:

  • Reduces delay in SCM DB flush
  • Improves delete block processing latency
  • Simplifies configuration

What is the link to the Apache JIRA

HDDS-15065

How was this patch tested?

Tested using modified testcases.

@ChenSammi
Copy link
Copy Markdown
Contributor

ChenSammi commented Apr 22, 2026

@sumitagrawl , for HDDS-8508(#4683), what was the consideration at that time, that triggers snapshot instead of calls transactionBuffer.flush() in SCMHATransactionBufferMonitorTask.run() ?

@sumitagrawl
Copy link
Copy Markdown
Contributor

@sumitagrawl , for HDDS-8508(#4683), what was the consideration at that time, that triggers snapshot instead of calls transactionBuffer.flush() in SCMHATransactionBufferMonitorTask.run() ?

There is a discussion for setting up to '1' but after discussion with @szetszwo , its not practical scenario to have less than 1024 transaction to be flushed as it keeps having updated.

@szetszwo Please share your opinion

@priyeshkaratha
Copy link
Copy Markdown
Contributor Author

@sumitagrawl @ChenSammi @szetszwo

The ozone.scm.ha.ratis.snapshot.threshold default of 1000 is smaller than the ozone.scm.ha.ratis.server.snapshot.creation.gap default of 1024. Because snapshot.creation.gap is the gate that Ratis applies to any manual snapshot API call (exactly the kind SCMHATransactionBufferMonitorTask makes), and because the auto-trigger always fires 24 entries earlier, the buffer monitor task never successfully triggers a snapshot under default configuration. It incurs an unnecessary Ratis RPC round-trip every 60 seconds that always goes nowhere, and any issue with the auto-trigger path has no time-based fallback as originally intended.

The fix is to either:

  • Raise ozone.scm.ha.ratis.snapshot.threshold to be strictly greater than ozone.scm.ha.ratis.server.snapshot.creation.gap (e.g., threshold = 2000, gap = 1024) so that the time-based monitor becomes the primary snapshot trigger for moderate transaction volumes, with auto-trigger as the backstop.

  • Lower ozone.scm.ha.ratis.server.snapshot.creation.gap below threshold (e.g., gap = 1, threshold = 1000) so the monitor can trigger aggressively in low-traffic situations without waiting for the auto-trigger.

@priyeshkaratha priyeshkaratha marked this pull request as ready for review April 23, 2026 20:19
@ChenSammi
Copy link
Copy Markdown
Contributor

@priyeshkaratha , can you try call transactionBuffer.flush() in SCMHATransactionBufferMonitorTask.run()? per the offline discussion with Sumit.

@szetszwo
Copy link
Copy Markdown
Contributor

szetszwo commented Apr 28, 2026

There is a discussion for setting up to '1' ...

@sumitagrawl , What was the discussion? Could you remind me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants