tiproxy: revive graceful scaling-in pods when scaling out TiProxy by djshow832 · Pull Request #6964 · pingcap/tidb-operator

djshow832 · 2026-06-24T12:39:48Z

Background

Scaling in a TiProxy pod may take 24h. If the operator assigns a new pod when scaling out, there may be many extra pods in a frequent auto-scaling workload, which is a waste.
The goal is to revive graceful scaling in pods when scaling out with the best effort.

Changes

The restarting/upgrading pods can't be revived, so the steps should be different from those of scaling in pods.

Graceful scale-in steps:

Set spec.offline=true. No deletionTimestamp because it may be reused.
Call the TiProxy API to mark it unhealthy.
Mark graceful-shutdown-begin-time on the pod.
Delete the CR if no reviving happens.

Scale-out steps:

Choose a pod to revive.
Set spec.offline=false and clear graceful-shutdown-begin-time.
Call the TiProxy API to clear the unhealthy status.

Follow-ups in HPA:

Skip the pods with graceful-shutdown-begin-time when calculating the average CPU/memory/network.

Notes:

There may be a race between scale-out and deleting pods, and a deleted pod may be revived. It will be solved in reconciliation by recreating a new pod or reviving another pod.
When scaling in and upgrading happen at the same time, a TiProxy pod with an old revision may be marked offline (in ScaleInUpdate) and may be revived in another scale-out. It will be solved in reconciliation by upgrading the pod.

Other refactors:

Rename IsStore() to SupportsStore().
Move some graceful restart logic into public files.

E2E tests

revive draining TiProxy pods on scale-out instead of creating new ones
create new TiProxy pods on scale-out when drained instances are no longer revivable
scale in and out TiProxy when health override API is unsupported

ti-chi-bot · 2026-06-24T12:39:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign liubog2008 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov-commenter · 2026-06-24T12:46:50Z

Codecov Report

❌ Patch coverage is 56.81818% with 114 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.56%. Comparing base (28394a5) to head (051d879).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6964      +/-   ##
==========================================
+ Coverage   39.53%   39.56%   +0.02%     
==========================================
  Files         430      433       +3     
  Lines       24378    24492     +114     
==========================================
+ Hits         9638     9690      +52     
- Misses      14740    14802      +62

Flag	Coverage Δ
unittest	`39.56% <56.81%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

djshow832 added 2 commits June 24, 2026 19:52

revive

208c283

add files

3f7b751

ti-chi-bot Bot requested a review from howardlau1999 June 24, 2026 12:39

github-actions Bot added the v2 for operator v2 label Jun 24, 2026

ti-chi-bot Bot added the size/XXL label Jun 24, 2026

djshow832 marked this pull request as draft June 25, 2026 03:51

ti-chi-bot Bot added the do-not-merge/work-in-progress label Jun 25, 2026

djshow832 added 5 commits June 25, 2026 11:51

add test

0f22a0d

2 more tests

fe2a4f9

update legacy tiproxy test

5b8d407

update test

1d955e0

fix test

4281157

djshow832 marked this pull request as ready for review June 30, 2026 11:54

ti-chi-bot Bot removed the do-not-merge/work-in-progress label Jun 30, 2026

ti-chi-bot Bot requested a review from shonge June 30, 2026 11:54

djshow832 added 6 commits July 1, 2026 20:28

fix graceful restart

e88afe7

refactor updater

32804df

apply patch

d062dac

isStore

fa83a92

update name

b50df6a

fix test

200428a

djshow832 requested a review from liubog2008 July 3, 2026 03:00

djshow832 marked this pull request as draft July 3, 2026 08:52

ti-chi-bot Bot added the do-not-merge/work-in-progress label Jul 3, 2026

djshow832 added 4 commits July 3, 2026 20:48

revert updater

ad58127

simplify

5db36f7

remove CR tiproxy-graceful-shutdown-begin-time

3b1e860

inline functions

59b792d

djshow832 added 2 commits July 4, 2026 08:45

split functions

051d879

simplify guard

fb1d0c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tiproxy: revive graceful scaling-in pods when scaling out TiProxy#6964

tiproxy: revive graceful scaling-in pods when scaling out TiProxy#6964
djshow832 wants to merge 19 commits into
pingcap:mainfrom
djshow832:reuse_pod

djshow832 commented Jun 24, 2026 •

edited

Loading

Uh oh!

ti-chi-bot Bot commented Jun 24, 2026

Uh oh!

codecov-commenter commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

djshow832 commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes

E2E tests

Uh oh!

ti-chi-bot Bot commented Jun 24, 2026

Uh oh!

codecov-commenter commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djshow832 commented Jun 24, 2026 •

edited

Loading

codecov-commenter commented Jun 24, 2026 •

edited

Loading