Skip to content

Historian does not reduce context to fit new model on large→small mid-session model switch (Input exceeds context window) #188

Description

@snoproblem

Bug

When switching from a large-context model to a smaller one mid-session, the historian does not reduce the context to fit the new model before sending the prompt, resulting in Input exceeds context window errors.

Reproduced with: GLM-5.2 (512k context) → GPT-5.5 (272k context), with ~300k input tokens persisted on the old model.

Root cause

The model-change branch in transform.ts:502-547 detects the switch and clears lastContextPercentage / lastInputTokens to 0. This suppresses every reduction path on the same pass:

  • Historian trigger (checkCompartmentTrigger): 0% < proactive floor → shouldFire=false
  • 95% emergency block (transform-compartment-phase.ts:328): 0% < 95% → no block
  • Overflow recovery bump (transform.ts:616-628): needsEmergencyRecovery was just cleared → no bump

The oversized prompt — sized for the old model's window — is sent to the new smaller model and rejected. Recovery only arms on the second pass, after the real overflow error fires the event handler's recordOverflowDetected. The user sees the error on the first request.

The clearing is correct for the small→large direction (where clearing is harmless — there is headroom to spare), and hook-handlers.test.ts:160 only covers that case. The large→small direction is unhandled.

Fix direction

In the model-change branch (transform.ts:502), before clearing, compare sessionMeta.lastInputTokens (the old model's last measured input tokens) against the new model's context limit (resolveTrustedContextLimit with lastAssistantModel). When oldInputTokens > newContextLimit, call recordOverflowDetected(db, sessionId, newContextLimit, newModelKey) to arm recovery — which makes the existing transform.ts:616-628 bump-to-95% path fire on the same pass, so historian + emergency drops run before the prompt is sent.

Reproduction

Failing test in PR #187: #187

The test reproduces the scenario:

  • Session at 300k input tokens on GLM-5.2 (58% of 512k — no trigger fired)
  • Switch to GPT-5.5 (272k) — 300k is ~110% of the new window
  • Asserts needsEmergencyRecovery === true and the oversized tool output is dropped

The test currently fails: needsEmergencyRecovery is false and the tool output stays active.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions