[scheduler] Resource accounting on Redis#2323
Draft
DiegoTavares wants to merge 19 commits into
Draft
Conversation
Scc silently drops inserts where the key already exists.
Also ensure all_sleeping_rounds is reset at the end of each full iteration
…queries Phase 1 scheduler quick wins: empty-cluster sleep, LIMIT, refresh guard - Empty-cluster sleep now configurable (cluster_empty_sleep, default 30s). - QUERY_PENDING_BY_SHOW_FACILITY_TAG capped via max_jobs_per_cluster_pass (default 20). Strict ORDER BY priority DESC; low-priority jobs deferred. - HostCacheService skips overlapping refresh ticks via an AtomicBool guard. Add V40 indexes for scheduler pending-job query GIN on layer.str_tags (array overlap), composite partial on job(pk_show, pk_facility, str_state, b_paused) WHERE PENDING/not paused, partial on layer_stat(pk_layer) WHERE int_waiting_count > 0. Plain CREATE INDEX (Flyway 5.2.0 wraps in a transaction, which Postgres rejects for CONCURRENTLY); apply with CONCURRENTLY via psql before Flyway when running against populated production tables. Drop LOWER(pk_facility) hack and rewrite QUERY_PENDING with EXISTS Scheduler-side facility id is now String (was Uuid). The dao::helpers parse_uuid path was lower-casing every facility round-trip, which forced LOWER() compares in 6 SQL sites. Cuebot writes canonical casing on insert, so a String swap removes the hack at the source. QUERY_PENDING_BY_SHOW_FACILITY_TAG rewritten to a single bookable_shows CTE plus EXISTS subquery, removing the layer ⨝ layer_stat ⨝ DISTINCT cardinality blowup. Folder cap split into outer early-out and per-layer fit inside the EXISTS.
Now shows can be moved to the scheduler using cueadmin: ``` cueadmin -show foo -setSchedulerManaged true ``` The following properties have been removed: ``` dispatcher.scheduler_manages_resources=false dispatcher.exclusion_list=show1,show2:facility.allocation,show3:facility.allocation ```
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
When a proc belonging to a show managed by the Scheduler is destroyed, its core and gpu counts are sent to Redis to update the cached version of the resource accounting tables. See design/SCHED_REDIS_DECISIONS.md for more details.
96b309f to
d14e52e
Compare
Migrate the in-memory cache for resource accounting to redis
Redis and Pg were using different units for cores, causing the accounting logic to fail.
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Moves the Rust scheduler's resource accounting off in-process
HashMaps and into a shared Redis store, with Cuebot's release path teed into the same store. Architecture, schema, theacct:seqguard, failure modes, and trade-offs are documented in docs/_docs/developer-guide/redis-accounting.md — this description covers what shipped on the branch, not how it works.Branch also carries an earlier batch of perf/stability fixes ("Phase 1") that landed ahead of the accounting work.
Phase 1 — perf and stability quick wins (pre-feature)
2072e63aDispatch + cluster query perf:LIMIT NonQUERY_PENDING_BY_SHOW_FACILITY_TAG, EXISTS rewrite, new indexes (V40 migration), facility case-sensitivity fix, empty-cluster sleep 3s → 30s, host-cache refresh overlap guard. Net: ~10× drop inQUERY_PENDINGrate, ~3–5× per-call cost reduction.b98b86faDrop tag chunk size defaults to more reasonable values.7e2a5941New metric for cluster round-trip duration.06a18ecdWrap cluster loop in panic guard.05dd9b31Wrap panic surface on resource-accounting logic.e1767b12Fix permit-update being ignored.bc5cd975,941c59feMinor refactors + review fixes.7f2c5124Mergemaster(carries RQD windows/system tweaks).PR A — schema + show-management flag + Cuebot read-side
388bbc4cNewshow.b_scheduler_managedcolumn (V42 migration),ShowDaoflag lookup + cache,ShowInterface.setSchedulerManagedgRPC + pycue wrapper +cueadmin -setSchedulerManaged. Dropsdispatcher.exclusion_list/dispatcher.scheduler_manages_resourcesfromopencue.properties;DispatcherDaoJdbcandWhiteboardDaoJdbcswitched to the new column. Updates the existing deploying-scheduler docs and a release post to reflect the new toggle.1087ae66Version bump + rename V40 migration after the V42 ordering.PR B — Cuebot Redis publisher + show-aware
unbookProcd14e52e4AccountingRedisPublisherinterface +LettuceAccountingRedisPublisher(single Lua doing 5 ×HINCRBY+INCR acct:seq).ProcDaoJdbc.unbookProcbranches on the cached flag: scheduler-managed shows getDELETE proc+afterCommitpublish, others keep today's behavior. Spring config wires the no-op vs real publisher onaccounting.redis.enabled; startup guardrail logs the misconfigured-Cuebot WARN. Tests acrossProcDaoTests,ShowDaoTests, andLettuceAccountingRedisPublisherTests.88495a1cAdd Lettuce dependency.PR C — Rust scheduler accounting module
9d480e08Newcrates/scheduler/src/accounting/module replacingResourceAccountingService: Redis client + Lua scripts (atomic check-and-modify withforcerollback), 2-min recompute, 5-min limit reseed, blocking bootstrap, managed-shows cache,BookingDeltaover 5 tables (DispatchLayerextended withfolder_id/dept_id). Compensation rollback at the dispatcher actor switched to Lua force mode.acct:seqCAS guard on every reseed. Newredis_integrationtest suite.d945f0d1Fix centicore-vs-core unit handling at PG/Cuebot↔Redis boundaries (limit reseed, recompute, release publisher,CoreSize); test centicores in the Lettuce publisher tests; dispatcher actor cleanups.336a6696Add GPU limit enforcement to the booking Lua + integration coverage.PR D — docs + cleanup (this session)
88f3732cNewdocs/_docs/developer-guide/redis-accounting.mdrewriting the surviving design content as a developer reference; re-points the four in-tree citations (accounting/mod.rs,AccountingRedisPublisher.java,LettuceAccountingRedisPublisher.java,LettuceAccountingRedisPublisherTests.java) to the new guide; deletesdesign/SCHED_REDIS_DECISIONS.md(history preserved in git).Test plan
./gradlew build(embedded Postgres covers V42 migration,ShowDao,ProcDaobranching,LettuceAccountingRedisPublisherLua wiring,ManageShow.setSchedulerManaged).cargo test -p schedulerandcargo test -p scheduler --features integration-tests(covers the booking Lua, recompute, limit reseed, bootstrap, GPU limits, centicore conversion at boundaries).cd pycue && pytest tests/wrappers/test_show.py(newsetSchedulerManagedwrapper).cd cueadmin && pytest tests/test_common.py(new subcommand)../docs/build.sh(new developer-guide entry renders, internal anchors resolve).cueadmin -setSchedulerManaged true, verify CuebotunbookProcpublishes to Redis (redis-cli MONITOR), let recompute run, flip back, confirm no negativeint_corespersists past one recompute cycle.cuebot_redis_publish_misconfiguredafter enablingaccounting.redis.enabled=truecluster-wide.Breaking changes
dispatcher.exclusion_listanddispatcher.scheduler_manages_resourcesremoved fromopencue.properties. Any show previously listed there must be migrated tob_scheduler_managed=trueviacueadmin -setSchedulerManaged trueduring the upgrade.accounting.redis.enabled=trueon every Cuebot before flipping any show to scheduler-managed (see deployment invariant in the dev guide).