Skip to content

fix(build): link bundled-embeddings on Windows MSVC by dropping tokenizers' esaxx_fast#224

Merged
tcconnally merged 1 commit into
mainfrom
fix/bundled-embeddings-windows-msvc-222
Jun 26, 2026
Merged

fix(build): link bundled-embeddings on Windows MSVC by dropping tokenizers' esaxx_fast#224
tcconnally merged 1 commit into
mainfrom
fix/bundled-embeddings-windows-msvc-222

Conversation

@tcconnally

Copy link
Copy Markdown
Collaborator

Closes #222.

Problem

bundled-embeddings compiled but failed to link on Windows MSVC:

mismatch detected for 'RuntimeLibrary': value 'MD_DynamicRelease' doesn't match
value 'MT_StaticRelease' ...  (LNK2038 / LNK1169: LIBCMT vs MSVCP140)

esaxx-rs's build.rs compiles its C++ with cc::Build::new().cpp(true).static_crt(true) (/MT), while ort's downloaded prebuilt binaries use the dynamic CRT (/MD). The two CRTs can't be linked into one executable.

Root cause (more precise than the issue)

esaxx-rs is a non-optional dependency of tokenizers, but its C++ build is gated:

// esaxx-rs/build.rs
#[cfg(feature = "cpp")]
fn build() { cc::Build::new().cpp(true).static_crt(true).file("src/esaxx.cpp")... }
# tokenizers/Cargo.toml
default = ["progressbar", "onig", "esaxx_fast"]
esaxx_fast = ["esaxx-rs/cpp"]   # <-- the ONLY thing enabling the /MT C++ build

esaxx_fast only accelerates Unigram training. mimir uses tokenizers purely for inference (Tokenizer::from_file + encode), so the C++ path is dead weight for us.

Fix

Disable tokenizers' default features and re-list the other two defaults (onig, progressbar), dropping only esaxx_fast:

tokenizers = { version = "0.23", optional = true, default-features = false, features = ["onig", "progressbar"] }

esaxx-rs stays in the tree (it's non-optional) but now builds as pure Rust — no /MT C++ objects, so nothing clashes with ort's /MD.

  • Cross-platform: no [patch]/fork, ort untouched, Linux unaffected.
  • No functional change: esaxx_fast is training-only; mimir never trains tokenizers.
  • Cargo.lock: esaxx-rs simply loses its cc build-dep edge.

Verification (x86_64-pc-windows-msvc)

  • Before: cargo build --features bundled-embeddingsLNK1169: multiply defined symbols (LIBCMT vs MSVCP140).
  • After: same command links cleanly and produces a working mimir.exe (Finished in ~12s).

🤖 Generated with Claude Code

…izers' esaxx_fast

bundled-embeddings compiled but failed to LINK on Windows MSVC (LNK2038 /
LNK1169): esaxx-rs's build.rs hardcodes `cc::Build::static_crt(true)` (/MT),
which clashes with ort's prebuilt /MD binaries — the two CRTs can't be linked
into one executable.

esaxx-rs is a *non-optional* dependency of tokenizers, but its C++ build is
`#[cfg(feature = "cpp")]`-gated, and the only thing enabling `cpp` is
tokenizers' default `esaxx_fast` feature (`esaxx_fast = ["esaxx-rs/cpp"]`).
esaxx_fast only accelerates Unigram *training*; mimir uses tokenizers purely for
inference (Tokenizer::from_file + encode), so the C++ path is dead weight for us.

Setting `default-features = false` and re-listing the other two defaults
(onig, progressbar) drops esaxx_fast → esaxx-rs builds as pure Rust (no /MT C++
objects) → bundled-embeddings links. Cross-platform, no upstream fork, ort
untouched; Linux is unaffected.

Verified on x86_64-pc-windows-msvc: `cargo build --features bundled-embeddings`
fails to link before this change and produces a working binary after.

Closes #222

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@tcconnally tcconnally merged commit cc57181 into main Jun 26, 2026
4 checks passed
@tcconnally tcconnally deleted the fix/bundled-embeddings-windows-msvc-222 branch June 26, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

build: bundled-embeddings fails to LINK on Windows MSVC (esaxx-rs /MT vs ort /MD CRT clash)

1 participant