Runtime should explicitly handle orphan orchestrator queue messages

## Problem

When queue messages (e.g., `QueueMessage` from `enqueue_event`) arrive in the orchestrator queue **before** `StartOrchestration` for a new instance, the runtime's orchestration dispatcher encounters a batch with no instance, no history, and no `StartOrchestration`/`ContinueAsNew` message. The current behavior:

1. `fetch_orchestration_item` returns the batch with `orchestration_name="Unknown"`
2. The runtime logs `"completion messages for unstarted instance"` and `"empty effective batch"`
3. The runtime acks the batch, which **permanently deletes** the queue rows
4. The events are lost forever

This was discovered via the `sample_config_hot_reload_persistent_events_fs` e2e test, which enqueues events before starting an orchestration.

## Current Provider-Side Workaround

Both `duroxide-pg` and `duroxide-pg-opt` have implemented a provider-side fix in their `fetch_orchestration_item` stored procedure:

1. **Scan ALL messages** for `StartOrchestration`/`ContinueAsNew` (not just `messages[0]`), matching the SQLite provider's `work_items.iter().find()` behavior
2. **If no start item found**: release locks and return nothing, leaving messages in the queue until `StartOrchestration` arrives

This works but pushes responsibility to the provider, which:
- Is fragile (providers must each implement this correctly)
- Cannot add a `visible_at` delay to prevent tight re-fetching (any delay risks events being lost if the orchestration completes before the delay expires)
- Relies on `LISTEN/NOTIFY` for backpressure to prevent tight-looping

## Proposed Runtime-Level Fix

The runtime's orchestration dispatcher should handle this case explicitly:

1. When `fetch_orchestration_item` returns a batch with no instance and no `StartOrchestration`/`ContinueAsNew` in the messages, the runtime should **abandon** the batch (not ack it)
2. The abandon should use a reasonable delay (e.g., 500ms) so items become available again later
3. This keeps the contract simple: providers return whatever is in the queue, and the runtime decides what to do

This would also allow removing the provider-side workarounds.

## Provider Validation Test

This issue was only caught by an e2e sample test (`sample_config_hot_reload_persistent_events_fs`), not by any provider validation test. A dedicated test should be added to `duroxide::provider_validation` that:

1. Enqueues one or more `QueueMessage` events for an instance **before** calling `start_orchestration`
2. Then starts the orchestration
3. Verifies that all pre-enqueued events are delivered to the orchestration (not silently dropped)

This would ensure all provider implementations are validated against this scenario without requiring full e2e tests.

## Affected Code

- Runtime: `dispatchers/orchestration.rs` - the `"completion messages for unstarted instance"` code path
- Provider trait: `abandon_orchestration_item` is already available for this purpose
- Test suite: `duroxide::provider_validation` - add orphan message handling test

## References

- `duroxide-pg-opt` migration `0006_fix_orphan_queue_messages.sql`
- `duroxide-pg` migration `0016_fix_orphan_queue_messages.sql`
- Test: `sample_config_hot_reload_persistent_events_fs`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime should explicitly handle orphan orchestrator queue messages #5

Problem

Current Provider-Side Workaround

Proposed Runtime-Level Fix

Provider Validation Test

Affected Code

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Runtime should explicitly handle orphan orchestrator queue messages #5

Description

Problem

Current Provider-Side Workaround

Proposed Runtime-Level Fix

Provider Validation Test

Affected Code

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions