Skip to content

feat(bigtable): add client side metric instrumentation to basic rpcs#16712

Open
daniel-sanche wants to merge 4 commits intomainfrom
bigtable_csm_1_basic_instrumentation
Open

feat(bigtable): add client side metric instrumentation to basic rpcs#16712
daniel-sanche wants to merge 4 commits intomainfrom
bigtable_csm_1_basic_instrumentation

Conversation

@daniel-sanche
Copy link
Copy Markdown
Contributor

Migration of googleapis/python-bigtable#1188 to the monorepo

This PR builds off of googleapis/python-bigtable#1187 to add instrumentation to basic data client rpcs (check_and_mutate, read_modify_write, sample_row_keys, mutate_row)

Metrics are not currently being exported anywhere, just collected and dropped. A future PR will add a GCP exporter to the system

@daniel-sanche daniel-sanche requested a review from a team as a code owner April 17, 2026 20:19
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates client-side metrics tracking into the Bigtable data client for both asynchronous and synchronous implementations. It wraps key operations—including sample_row_keys, mutate_row, check_and_mutate_row, and read_modify_write_row—with metrics collection logic and introduces a tracked_retry wrapper to monitor retry attempts. Additionally, the PR refactors system tests by consolidating fixtures into a shared SystemTestRunner class and adds new system tests specifically for metrics. Feedback focuses on regressions in retry logic where critical arguments like sleep_generator and exception_factory were omitted in the transition to tracked_retry. There are also suggestions to improve resource cleanup in test fixtures and to relax restrictive timing assertions in tests to prevent flakiness.

),
clusters=cluster_config,
)
operation.result(timeout=240)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If operation.result(timeout=240) raises an exception (e.g., a TimeoutError), the fixture will stop execution and the delete_instance call in the teardown phase will never be reached. This can lead to leaked Bigtable instances in the test project. Consider wrapping the creation and yield in a try...finally block or ensuring cleanup happens even on creation failure.

operation.zone
== cluster_config[operation.cluster_id].location.split("/")[-1]
)
assert operation.duration_ns > 0 and operation.duration_ns < 1e9
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion operation.duration_ns < 1e9 (1 second) might be too restrictive for system tests running against a live backend. Network latency or backend load could easily cause an RPC to exceed 1 second, leading to flaky tests. It is recommended to remove this upper bound or increase it significantly.

Suggested change
assert operation.duration_ns > 0 and operation.duration_ns < 1e9
assert operation.duration_ns > 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant