Skip to content

[scheduler] Resource accounting on Redis#2323

Draft
DiegoTavares wants to merge 19 commits into
AcademySoftwareFoundation:masterfrom
DiegoTavares:sched_accounting_redis
Draft

[scheduler] Resource accounting on Redis#2323
DiegoTavares wants to merge 19 commits into
AcademySoftwareFoundation:masterfrom
DiegoTavares:sched_accounting_redis

Conversation

@DiegoTavares
Copy link
Copy Markdown
Collaborator

@DiegoTavares DiegoTavares commented May 14, 2026

Summary

Moves the Rust scheduler's resource accounting off in-process HashMaps and into a shared Redis store, with Cuebot's release path teed into the same store. Architecture, schema, the acct:seq guard, failure modes, and trade-offs are documented in docs/_docs/developer-guide/redis-accounting.md — this description covers what shipped on the branch, not how it works.

Branch also carries an earlier batch of perf/stability fixes ("Phase 1") that landed ahead of the accounting work.

Phase 1 — perf and stability quick wins (pre-feature)

  • 2072e63a Dispatch + cluster query perf: LIMIT N on QUERY_PENDING_BY_SHOW_FACILITY_TAG, EXISTS rewrite, new indexes (V40 migration), facility case-sensitivity fix, empty-cluster sleep 3s → 30s, host-cache refresh overlap guard. Net: ~10× drop in QUERY_PENDING rate, ~3–5× per-call cost reduction.
  • b98b86fa Drop tag chunk size defaults to more reasonable values.
  • 7e2a5941 New metric for cluster round-trip duration.
  • 06a18ecd Wrap cluster loop in panic guard.
  • 05dd9b31 Wrap panic surface on resource-accounting logic.
  • e1767b12 Fix permit-update being ignored.
  • bc5cd975, 941c59fe Minor refactors + review fixes.
  • 7f2c5124 Merge master (carries RQD windows/system tweaks).

PR A — schema + show-management flag + Cuebot read-side

  • 388bbc4c New show.b_scheduler_managed column (V42 migration), ShowDao flag lookup + cache, ShowInterface.setSchedulerManaged gRPC + pycue wrapper + cueadmin -setSchedulerManaged. Drops dispatcher.exclusion_list / dispatcher.scheduler_manages_resources from opencue.properties; DispatcherDaoJdbc and WhiteboardDaoJdbc switched to the new column. Updates the existing deploying-scheduler docs and a release post to reflect the new toggle.
  • 1087ae66 Version bump + rename V40 migration after the V42 ordering.

PR B — Cuebot Redis publisher + show-aware unbookProc

  • d14e52e4 AccountingRedisPublisher interface + LettuceAccountingRedisPublisher (single Lua doing 5 × HINCRBY + INCR acct:seq). ProcDaoJdbc.unbookProc branches on the cached flag: scheduler-managed shows get DELETE proc + afterCommit publish, others keep today's behavior. Spring config wires the no-op vs real publisher on accounting.redis.enabled; startup guardrail logs the misconfigured-Cuebot WARN. Tests across ProcDaoTests, ShowDaoTests, and LettuceAccountingRedisPublisherTests.
  • 88495a1c Add Lettuce dependency.

PR C — Rust scheduler accounting module

  • 9d480e08 New crates/scheduler/src/accounting/ module replacing ResourceAccountingService: Redis client + Lua scripts (atomic check-and-modify with force rollback), 2-min recompute, 5-min limit reseed, blocking bootstrap, managed-shows cache, BookingDelta over 5 tables (DispatchLayer extended with folder_id / dept_id). Compensation rollback at the dispatcher actor switched to Lua force mode. acct:seq CAS guard on every reseed. New redis_integration test suite.
  • d945f0d1 Fix centicore-vs-core unit handling at PG/Cuebot↔Redis boundaries (limit reseed, recompute, release publisher, CoreSize); test centicores in the Lettuce publisher tests; dispatcher actor cleanups.
  • 336a6696 Add GPU limit enforcement to the booking Lua + integration coverage.

PR D — docs + cleanup (this session)

  • 88f3732c New docs/_docs/developer-guide/redis-accounting.md rewriting the surviving design content as a developer reference; re-points the four in-tree citations (accounting/mod.rs, AccountingRedisPublisher.java, LettuceAccountingRedisPublisher.java, LettuceAccountingRedisPublisherTests.java) to the new guide; deletes design/SCHED_REDIS_DECISIONS.md (history preserved in git).

Test plan

  • Cuebot: ./gradlew build (embedded Postgres covers V42 migration, ShowDao, ProcDao branching, LettuceAccountingRedisPublisher Lua wiring, ManageShow.setSchedulerManaged).
  • Rust: cargo test -p scheduler and cargo test -p scheduler --features integration-tests (covers the booking Lua, recompute, limit reseed, bootstrap, GPU limits, centicore conversion at boundaries).
  • pycue: cd pycue && pytest tests/wrappers/test_show.py (new setSchedulerManaged wrapper).
  • cueadmin: cd cueadmin && pytest tests/test_common.py (new subcommand).
  • ./docs/build.sh (new developer-guide entry renders, internal anchors resolve).
  • Manual sandbox smoke: flip a show via cueadmin -setSchedulerManaged true, verify Cuebot unbookProc publishes to Redis (redis-cli MONITOR), let recompute run, flip back, confirm no negative int_cores persists past one recompute cycle.
  • Deploy validation: confirm no Cuebot logs cuebot_redis_publish_misconfigured after enabling accounting.redis.enabled=true cluster-wide.

Breaking changes

  • dispatcher.exclusion_list and dispatcher.scheduler_manages_resources removed from opencue.properties. Any show previously listed there must be migrated to b_scheduler_managed=true via cueadmin -setSchedulerManaged true during the upgrade.
  • Deployments wanting to keep the scheduler running must set accounting.redis.enabled=true on every Cuebot before flipping any show to scheduler-managed (see deployment invariant in the dev guide).

Scc silently drops inserts where the key already exists.
Also ensure all_sleeping_rounds is reset at the end of each full iteration
…queries

Phase 1 scheduler quick wins: empty-cluster sleep, LIMIT, refresh guard

- Empty-cluster sleep now configurable (cluster_empty_sleep, default 30s).
- QUERY_PENDING_BY_SHOW_FACILITY_TAG capped via max_jobs_per_cluster_pass
  (default 20). Strict ORDER BY priority DESC; low-priority jobs deferred.
- HostCacheService skips overlapping refresh ticks via an AtomicBool guard.

Add V40 indexes for scheduler pending-job query

GIN on layer.str_tags (array overlap), composite partial on
job(pk_show, pk_facility, str_state, b_paused) WHERE PENDING/not paused,
partial on layer_stat(pk_layer) WHERE int_waiting_count > 0.

Plain CREATE INDEX (Flyway 5.2.0 wraps in a transaction, which Postgres
rejects for CONCURRENTLY); apply with CONCURRENTLY via psql before Flyway
when running against populated production tables.

Drop LOWER(pk_facility) hack and rewrite QUERY_PENDING with EXISTS

Scheduler-side facility id is now String (was Uuid). The dao::helpers
parse_uuid path was lower-casing every facility round-trip, which forced
LOWER() compares in 6 SQL sites. Cuebot writes canonical casing on insert,
so a String swap removes the hack at the source.

QUERY_PENDING_BY_SHOW_FACILITY_TAG rewritten to a single bookable_shows
CTE plus EXISTS subquery, removing the layer ⨝ layer_stat ⨝ DISTINCT
cardinality blowup. Folder cap split into outer early-out and per-layer
fit inside the EXISTS.
Now shows can be moved to the scheduler using cueadmin:

```
cueadmin -show foo -setSchedulerManaged true
```

The following properties have been removed:

```
dispatcher.scheduler_manages_resources=false
dispatcher.exclusion_list=show1,show2:facility.allocation,show3:facility.allocation
```
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a8104dd-98a6-4d92-bb44-74a381630ccb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

When a proc belonging to a show managed by the Scheduler is destroyed, its core and gpu counts are
sent to Redis to update the cached version of the resource accounting tables. See
design/SCHED_REDIS_DECISIONS.md for more details.
@DiegoTavares DiegoTavares force-pushed the sched_accounting_redis branch from 96b309f to d14e52e Compare May 20, 2026 22:58
Migrate the in-memory cache for resource accounting to redis
Redis and Pg were using different units for cores, causing the accounting logic to fail.
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant