fix: clone endpoint to preserve tokenizer getter (fixes #321945)#321947
Open
vs-code-engineering[bot] wants to merge 1 commit into
Open
fix: clone endpoint to preserve tokenizer getter (fixes #321945)#321947vs-code-engineering[bot] wants to merge 1 commit into
vs-code-engineering[bot] wants to merge 1 commit into
Conversation
Replace the object spread { ..._endpoint, modelMaxPromptTokens } with
_endpoint.cloneWithTokenOverride(tokenLimit) in
CopilotLanguageModelWrapper._provideLanguageModelResponse.
The spread copies only own enumerable properties, dropping prototype
getters such as tokenizer on getter-based endpoints (for example
ExtensionContributedChatEndpoint). That produced an IChatEndpoint whose
tokenizer was undefined and crashed acquireTokenizer with
"Unknown tokenizer: undefined". cloneWithTokenOverride returns a real
endpoint instance with the getter intact while still overriding the
max prompt tokens.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CopilotLanguageModelWrapper._provideLanguageModelResponsecrashes withError: Unknown tokenizer: undefinedwhen it processes a request for an endpoint whosetokenizeris implemented as a prototype getter rather than an own property.The offending line built the prompt-rendering endpoint with a JavaScript object spread:
Object spread copies only own enumerable properties. Prototype accessors (getters) are not own properties, so they are dropped. For
ChatEndpointinstancestokenizeris an own property (assigned in the constructor as... ?? TokenizerType.O200K), so the spread happened to preserve it. But for getter-based endpoints — most commonlyExtensionContributedChatEndpoint, whoseget tokenizer()returnsTokenizerType.O200K— the spread strips the getter and yields an object withtokenizer === undefined. That object is typed by TypeScript as a completeIChatEndpoint(the type-system bypass), but at runtime it violates the non-nullabletokenizer: TokenizerTypecontract, andacquireTokenizerthrows.The fix replaces the spread with
_endpoint.cloneWithTokenOverride(tokenLimit)— the purpose-builtIChatEndpointmethod that returns a real endpoint instance (prototype intact) with the max-prompt-tokens override applied.Fixes #321945
Recommended reviewer:
@lramos15Culprit Commit
Not precisely identifiable from this checkout. The repository is a shallow clone (
fetch-depth=1), sogit blameandgit log -Scannot pinpoint the commit that introduced the object spread; the spread predates the visible history window and is present in the shipped commit (v1.125.0).Framing as a regression: the identical error message
Unknown tokenizer: undefinedwas previously fixed forChatEndpointin #319620 (commitb7893470, ~June 2) by defaulting the tokenizer toO200Kin theChatEndpointconstructor. That fix storestokenizeras an own property, so it survives the spread — but it does not protect endpoints that exposetokenizeras a prototype getter. The error resurfaced once getter-based endpoints (notablyExtensionContributedChatEndpoint, which is stored in_chatEndpointsfromgetAllChatEndpoints()and returned by_getEndpointForModel) began flowing through_provideLanguageModelResponse, exposing the long-standing spread bypass.Code Flow
sequenceDiagram participant Ext as Extension-contributed LM participant LMA as CopilotLanguageModelWrapper participant PR as PromptRenderer participant TOK as acquireTokenizer Ext->>LMA: provideLanguageModelResponse(endpoint) Note over LMA: endpoint = ExtensionContributedChatEndpoint<br/>tokenizer is a prototype getter (O200K) LMA->>PR: create(_endpoint) [line 651, getter intact] PR-->>LMA: countTokens() OK (no crash) LMA->>LMA: tokenLimit = _endpoint.modelMaxPromptTokens - baseCount - ... LMA->>PR: create({ ..._endpoint, modelMaxPromptTokens: tokenLimit }) [line 659] Note over LMA,PR: spread copies own props only<br/>prototype getter dropped -> tokenizer === undefined PR->>TOK: acquireTokenizer(endpoint) TOK-->>Ext: throw Error("Unknown tokenizer: undefined")Affected Files
extensions/copilot/src/extension/conversation/vscode-node/languageModelAccess.ts— modified. The bypass/producer at the secondPromptRenderer.createinCopilotLanguageModelWrapper._provideLanguageModelResponse. The firstcreate(a few lines above) passes_endpointdirectly and never crashed, which matches the telemetry crashing only at the second call.extensions/copilot/src/platform/tokenizer/node/tokenizer.ts— not modified (crash site).acquireTokenizerthrowsUnknown tokenizer: ${endpoint.tokenizer}when the value is undefined. Intentionally left untouched: the crash site reports where it fails, not why.extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts— not modified (example producer).ExtensionContributedChatEndpointexposestokenizer,modelMaxPromptTokens,model, etc. as prototype getters, and itscloneWithTokenOverridecorrectly preserves them.extensions/copilot/src/platform/networking/common/networking.ts— not modified (contract). Declarestokenizer: TokenizerType(non-nullable) andcloneWithTokenOverride(modelMaxPromptTokens: number): IChatEndpoint.Repro Steps
ExtensionContributedChatEndpoint(itstokenizeris a prototype getter).CopilotLanguageModelWrapper._provideLanguageModelResponse.PromptRenderer.create(...).countTokens()succeeds (endpoint passed directly, getter intact).PromptRenderer.create({ ..._endpoint, modelMaxPromptTokens: tokenLimit }).render()receives a spread copy whosetokenizergetter was stripped (undefined).acquireTokenizer(endpoint), which throwsUnknown tokenizer: undefined(telemetry bucketf969dd59-a13a-f94e-2982-6394d7b51e19).How the Fix Works
Chosen approach (
languageModelAccess.ts): replace the object-spread endpoint argumentwith
This fixes the data producer (where the malformed endpoint object is created) rather than guarding the crash site — the data-producer principle.
cloneWithTokenOverrideis declared on theIChatEndpointinterface and implemented by every endpoint type, including the getter-based ones:ChatEndpointclones its model metadata withmax_prompt_tokensoverridden, andExtensionContributedChatEndpointconstructs a fresh instance withmaxInputTokensoverridden (itsmodelMaxPromptTokensgetter then returns the new value). Because it returns a genuine endpoint instance, thetokenizergetter (and all other prototype members) remain present, soendpoint.tokenizerresolves toO200Kinstead ofundefined. The override semantics are identical to what the spread intended (setmodelMaxPromptTokenstotokenLimit), and this exact method is already used elsewhere in the same file for context-size overrides, so it is the idiomatic call. The crash site intokenizer.tsand itslogService/throw path are left untouched, so genuinely-malformed endpoints would still be surfaced to telemetry.After this change, the second
PromptRenderer.createcall cannot produce an endpoint withtokenizer === undefined, becausecloneWithTokenOverridereturns a realIChatEndpointinstance whosetokenizergetter/property is preserved, instead of an object-spread copy that drops prototype getters.Alternatives considered:
tokenizer ?? O200Kfallback at the crash site inacquireTokenizer— rejected: it guards the symptom at the bottom of the stack and hides every future producer of a malformed endpoint from telemetry.ExtensionContributedChatEndpoint's getters to own properties so the spread copies them — rejected: it patches one consumer-visible symptom while leaving the spread bypass in place for the other getter-based endpoints (StreamingPassThroughEndpoint,ClaudeStreamingPassThroughEndpoint), and re-introduces the same class of bug for any future getter-based endpoint.Recommended Owner
@lramos15(Logan Ramos) — top contributor and most-recent author oflanguageModelAccess.ts(6 of the last 15 commits, including the most recent on the issue's ship date), working directly in the language-model-access / context-size area that ownscloneWithTokenOverrideusage in this file, with demonstrated write access (direct merges tomain).