Skip to content

[server] Remove CDC log metrics from PK table write path to avoid dou…#3177

Open
swuferhong wants to merge 1 commit intoapache:mainfrom
swuferhong:records-count-metrics
Open

[server] Remove CDC log metrics from PK table write path to avoid dou…#3177
swuferhong wants to merge 1 commit intoapache:mainfrom
swuferhong:records-count-metrics

Conversation

@swuferhong
Copy link
Copy Markdown
Contributor

…ble counting

Purpose

Linked issue: close #3176

For primary key tables, putToLocalKv() was incrementing both KV and CDC
log metrics (messagesIn, bytesIn), causing the table-level throughput to
report roughly 2x the actual write volume.

The CDC log is an internal mechanism for changelog replication, and its
metrics should not be counted in the user-facing table-level aggregation,
as this is confusing for users monitoring PK table performance.

Brief change log

Tests

API and Format

Documentation

@binary-signal
Copy link
Copy Markdown
Contributor

binary-signal commented Apr 24, 2026

The metrics are tracking different things intentionally: lines 1301-1306:

  // metric for kv                                                                                                                                                                                                                                                                      
  tableMetrics.incKvMessageIn(entry.getValue().getRecordCount());  // KV write volume                                                                                                                                                                                                   
  tableMetrics.incKvBytesIn(entry.getValue().sizeInBytes());        // KV write bytes
  // metric for cdc log of kv                                                                                                                                                                                                                                                           
  tableMetrics.incLogBytesIn(appendInfo.validBytes());              // CDC log volume                                                                                                                                                                                                   
  tableMetrics.incLogMessageIn(appendInfo.numMessages());           // CDC log messages                                                                                                                                                                                                 

These are separate metric counters: KvBytesIn vs LogBytesIn, KvMessageIn vs LogMessageIn. They're not double-counting into the same metric. The CDC log is real I/O that actually happens which it gets written, replicated to followers, and consumed by downstream (Flink connectors, lake tiering etc).

Hiding that from metrics means:

  1. Replication monitoring breaks - followers fetch from the CDC log. If you don't track its volume, you can't reason about replication lag or bandwidth.
  2. Storage accounting becomes wrong, the CDC log consumes real disk space and network. Omitting it from metrics hides actual resource usage.

Compare with appendToLocalLog() at line 1254 — for log-only tables, LogBytesIn tracks the user-facing write. For PK tables, KvBytesIn tracks the user-facing write, and LogBytesIn tracks the internal CDC consequence. Both are useful; they just need to be interpreted differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PK table messagesIn/bytesIn metrics double-counted due to CDC log

2 participants