Skip to content

Cross-node sub-orchestration completion routing after continue-as-new #30

@tjgreen42

Description

@tjgreen42

A sub-orchestration reports its parent's execution id back on completion (SubOrchCompleted / SubOrchFailed) via Runtime::get_execution_id_for_instance. That resolver consults a per-Runtime in-memory map (current_execution_ids) and, on a miss, falls back to INITIAL_EXECUTION_ID (1). The map is populated only when the parent runs a turn on that same node.

This is correct on a single node, but in a multi-node deployment (multiple Runtimes sharing one provider, e.g. several AKS pods) the child may complete on a node where the parent never ran. The map misses, the fallback resolves the parent execution to 1, and for a parent past its first execution (i.e. after continue_as_new) the completion is filtered out as belonging to a stale execution — the parent then hangs awaiting a completion that never arrives.

The single-node case is covered by tests/scenarios/suborch_id_collision.rs::parent_with_suborch_survives_continue_as_new, which seeds the map at turn start. The cross-node case is not exercised: the existing multi-node tests (sessions.rs, rolling_deployment.rs, timer_tests.rs) run multiple in-process Runtimes but none drive a sub-orchestration whose parent has continued-as-new across nodes, and CI runs a single ubuntu-latest job, so there is no distributed coverage.

Likely fix: on a map miss, resolve the parent execution from the provider. Provider::latest_execution_id(instance) already exists for this; get_execution_id_for_instance previously queried it and the lookup was removed in favour of the in-memory map. A test that schedules a sub-orchestration in a continue-as-new loop while pinning parent and child to different nodes would close the coverage gap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions