Bug: duroxide-pg migration race condition on concurrent startup
Summary
When multiple duroxide workers start simultaneously against a freshly-created (empty) PostgreSQL database, all workers attempt to run schema migrations concurrently. The first worker succeeds, but all subsequent workers crash with a fatal duplicate key value violates unique constraint "_duroxide_migrations_pkey" error.
This is a bug in the duroxide-pg provider's migration system — it does not use an advisory lock, INSERT ... ON CONFLICT DO NOTHING, or any other form of concurrency control for migration execution.
Versions
- duroxide runtime (Rust): 0.1.26
- duroxide-pg provider (Rust): 0.1.27
- duroxide npm package (Node.js SDK): 0.1.17
- PostgreSQL: Azure Database for PostgreSQL (Flexible Server)
- Node.js: v24.14.1
Environment
- 6 worker pods in AKS, all connecting to the same PostgreSQL database
- Database was freshly reset (schemas dropped via
DROP SCHEMA IF EXISTS duroxide CASCADE)
- All 6 pods started simultaneously via Kubernetes deployment scale-up
Steps to Reproduce
- Drop the
duroxide schema from PostgreSQL (clean state)
- Start N>1 duroxide workers simultaneously, all pointed at the same database
- Observe that only 1 worker boots successfully; the remaining N-1 crash
Expected Behavior
All workers should start successfully. The migration system should handle concurrent startup gracefully — e.g., by using a PostgreSQL advisory lock (pg_advisory_lock) around migration execution, or by using INSERT ... ON CONFLICT DO NOTHING for migration record inserts.
Actual Behavior
The first worker to reach the migration step wins and inserts a row into _duroxide_migrations. All other workers attempt the same INSERT and hit:
[Error: Failed to connect to PostgreSQL: error returned from database: duplicate key value violates unique constraint "_duroxide_migrations_pkey"] {
code: 'GenericFailure'
}
This is surfaced as a fatal uncaught error in the Node.js SDK, crashing the worker process.
Full crash log from an affected pod
[worker] Pod: copilot-runtime-worker-6d99bf7c7b-7pd7s
[worker] Store: postgresql://***@<host>:5432/postgres?sslmode=require
[PilotSwarmWorker] Loaded: framework base prompt; 2 skill dir(s); 4 system agent(s)
node:internal/modules/run_main:107
triggerUncaughtException(
^
[Error: Failed to connect to PostgreSQL: error returned from database: duplicate key value violates unique constraint "_duroxide_migrations_pkey"] {
code: 'GenericFailure'
}
Node.js v24.14.1
Log from the winning pod (healthy boot)
2026-04-04T19:33:53.834718Z INFO duroxide::runtime: duroxide runtime (0.1.26) starting with provider duroxide-pg (0.1.27)
2026-04-04T19:33:53.946453Z INFO duroxide::runtime: Orchestration dispatcher capability filter configured supported_range=>=0.0.0, <=0.1.26
[worker] Started ✓ Polling for orchestrations...
Impact
In a Kubernetes deployment with N replicas, a clean database reset followed by scale-up reliably crashes N-1 workers on first boot. They self-heal via Kubernetes restarts (since migrations are done by then), but the initial crash is noisy, delays full cluster readiness, and could cause issues in environments with strict restart budgets or slow restart policies.
Suggested Fix
In duroxide-pg's migration runner:
-
Preferred: Acquire a PostgreSQL advisory lock before checking/running migrations:
SELECT pg_advisory_lock('duroxide_migrations'::regclass::oid::bigint);
-- run migrations
SELECT pg_advisory_unlock('duroxide_migrations'::regclass::oid::bigint);
-
Alternative: Use INSERT INTO _duroxide_migrations ... ON CONFLICT DO NOTHING so that concurrent inserts of the same migration record are silently ignored rather than raising an error.
-
Minimum: Catch the unique constraint violation on _duroxide_migrations_pkey and treat it as a no-op (migration already applied by another worker) rather than a fatal error.
Bug: duroxide-pg migration race condition on concurrent startup
Summary
When multiple duroxide workers start simultaneously against a freshly-created (empty) PostgreSQL database, all workers attempt to run schema migrations concurrently. The first worker succeeds, but all subsequent workers crash with a fatal
duplicate key value violates unique constraint "_duroxide_migrations_pkey"error.This is a bug in the duroxide-pg provider's migration system — it does not use an advisory lock,
INSERT ... ON CONFLICT DO NOTHING, or any other form of concurrency control for migration execution.Versions
Environment
DROP SCHEMA IF EXISTS duroxide CASCADE)Steps to Reproduce
duroxideschema from PostgreSQL (clean state)Expected Behavior
All workers should start successfully. The migration system should handle concurrent startup gracefully — e.g., by using a PostgreSQL advisory lock (
pg_advisory_lock) around migration execution, or by usingINSERT ... ON CONFLICT DO NOTHINGfor migration record inserts.Actual Behavior
The first worker to reach the migration step wins and inserts a row into
_duroxide_migrations. All other workers attempt the sameINSERTand hit:This is surfaced as a fatal uncaught error in the Node.js SDK, crashing the worker process.
Full crash log from an affected pod
Log from the winning pod (healthy boot)
Impact
In a Kubernetes deployment with N replicas, a clean database reset followed by scale-up reliably crashes N-1 workers on first boot. They self-heal via Kubernetes restarts (since migrations are done by then), but the initial crash is noisy, delays full cluster readiness, and could cause issues in environments with strict restart budgets or slow restart policies.
Suggested Fix
In
duroxide-pg's migration runner:Preferred: Acquire a PostgreSQL advisory lock before checking/running migrations:
Alternative: Use
INSERT INTO _duroxide_migrations ... ON CONFLICT DO NOTHINGso that concurrent inserts of the same migration record are silently ignored rather than raising an error.Minimum: Catch the unique constraint violation on
_duroxide_migrations_pkeyand treat it as a no-op (migration already applied by another worker) rather than a fatal error.