fix(build): link bundled-embeddings on Windows MSVC by dropping tokenizers' esaxx_fast#224
Merged
Merged
Conversation
…izers' esaxx_fast bundled-embeddings compiled but failed to LINK on Windows MSVC (LNK2038 / LNK1169): esaxx-rs's build.rs hardcodes `cc::Build::static_crt(true)` (/MT), which clashes with ort's prebuilt /MD binaries — the two CRTs can't be linked into one executable. esaxx-rs is a *non-optional* dependency of tokenizers, but its C++ build is `#[cfg(feature = "cpp")]`-gated, and the only thing enabling `cpp` is tokenizers' default `esaxx_fast` feature (`esaxx_fast = ["esaxx-rs/cpp"]`). esaxx_fast only accelerates Unigram *training*; mimir uses tokenizers purely for inference (Tokenizer::from_file + encode), so the C++ path is dead weight for us. Setting `default-features = false` and re-listing the other two defaults (onig, progressbar) drops esaxx_fast → esaxx-rs builds as pure Rust (no /MT C++ objects) → bundled-embeddings links. Cross-platform, no upstream fork, ort untouched; Linux is unaffected. Verified on x86_64-pc-windows-msvc: `cargo build --features bundled-embeddings` fails to link before this change and produces a working binary after. Closes #222 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #222.
Problem
bundled-embeddingscompiled but failed to link on Windows MSVC:esaxx-rs'sbuild.rscompiles its C++ withcc::Build::new().cpp(true).static_crt(true)(/MT), whileort's downloaded prebuilt binaries use the dynamic CRT (/MD). The two CRTs can't be linked into one executable.Root cause (more precise than the issue)
esaxx-rsis a non-optional dependency oftokenizers, but its C++ build is gated:esaxx_fastonly accelerates Unigram training. mimir usestokenizerspurely for inference (Tokenizer::from_file+encode), so the C++ path is dead weight for us.Fix
Disable
tokenizers' default features and re-list the other two defaults (onig,progressbar), dropping onlyesaxx_fast:esaxx-rsstays in the tree (it's non-optional) but now builds as pure Rust — no/MTC++ objects, so nothing clashes withort's/MD.[patch]/fork,ortuntouched, Linux unaffected.esaxx_fastis training-only; mimir never trains tokenizers.Cargo.lock:esaxx-rssimply loses itsccbuild-dep edge.Verification (x86_64-pc-windows-msvc)
cargo build --features bundled-embeddings→LNK1169: multiply defined symbols(LIBCMT vs MSVCP140).mimir.exe(Finishedin ~12s).🤖 Generated with Claude Code