Allow XENOVA_CACHE_HOME to redirect local embedding model path#862
Allow XENOVA_CACHE_HOME to redirect local embedding model path#862canxer314 wants to merge 1 commit into
Conversation
@xenova/transformers defaults localModelPath to its own install dir inside npm global node_modules. When users pre-download models to ~/.cache/Xenova/ (offline or restricted-network environments), the library cannot find them and tries to fetch from huggingface.co. If that network is unreachable, every observation save logs: vector-index add: embed failed - skipping (provider: local) Read XENOVA_CACHE_HOME from the environment and, when set, override both env.localModelPath and env.cacheDir before the pipeline() call. Unset = zero behavior change.
|
@canxer314 is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthrough
ChangesOffline model cache configuration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/providers/embedding/local.ts`:
- Around line 49-60: Remove the explanatory "WHAT" comment block above the
XENOVA cache logic and replace it with either a concise WHY comment like "//
Support offline setups via XENOVA_CACHE_HOME" or no comment at all, leaving the
code self-descriptive; ensure the code around the cacheHome variable and
assignments to transformers.env.localModelPath and transformers.env.cacheDir
(and the XENOVA_CACHE_HOME env var name) remains unchanged so the intent is
preserved without the long HOW explanation.
- Around line 36-39: Extend the Xenova types in src/xenova.d.ts to include env
(with localModelPath and cacheDir) alongside pipeline and remove the //
`@ts-ignore` usage in src/providers/embedding/local.ts; update the file to
import/use the declared types instead of an inline env type, shorten the long
comment block to a concise one-line note about configuring
localModelPath/cacheDir, and (optionally) normalize XENOVA_CACHE_HOME by
expanding a leading "~" using os.homedir() before assigning to
transformers.env.cacheDir so paths are correct at runtime.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e9295cfe-5f6c-459d-abd0-82c2fbbf3122
📒 Files selected for processing (1)
src/providers/embedding/local.ts
| let transformers: { | ||
| pipeline: Pipeline; | ||
| env: { localModelPath: string; cacheDir: string }; | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify `@xenova/transformers` package version and check for env property documentation
# Check installed version
npm list `@xenova/transformers` 2>/dev/null || echo "Package not installed in current workspace"
# Search for any existing references to transformers.env in the codebase
rg -n 'transformers\.env' --type=ts -C2Repository: rohitg00/agentmemory
Length of output: 567
🌐 Web query:
@xenova/transformers version 2.17.2 env.localModelPath env.cacheDir configuration API documentation
💡 Result:
In @xenova/transformers version 2.17.2, the env object is the central configuration API used to control the library's behavior, including model loading and caching [1][2]. env.localModelPath This property specifies the base path from which local models are loaded [1][2]. It defaults to '/models/' [1][2]. You can configure it as follows: import { env } from '@xenova/transformers'; env.localModelPath = '/path/to/local/models/'; env.cacheDir This property defines the directory used for caching files when the file system (FSCache) is utilized [1][2]. By default, it is set to './.cache' [1][2]. You can configure it as follows: import { env } from '@xenova/transformers'; env.cacheDir = '/path/to/cache/directory/'; Additional configuration notes: - To enable local model loading, ensure env.allowLocalModels is set to true (which is typically the default) [1][2]. - For caching, the library uses both browser and file system caches depending on the environment [1][2]. You can control these using boolean flags such as env.useBrowserCache and env.useFSCache [1][2]. - The env object also exposes backend-specific configurations, such as the location of WASM files via env.backends.onnx.wasm.wasmPaths [3][4].
Citations:
- 1: https://huggingface.co/docs/transformers.js/v2.17.2/en/api/env
- 2: https://huggingface.co/docs/transformers.js/v2.17.2/api/env
- 3: https://huggingface.co/docs/transformers.js/v2.17.2/en/custom_usage
- 4: https://npmx.dev/package/@xenova/transformers/v/%5E2.17.2
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the relevant sections of local.ts around the env type + assignments
sed -n '1,120p' src/providers/embedding/local.ts | nl -ba | sed -n '25,95p'
# Show xenova.d.ts declarations for pipeline (and whether env is declared)
ls -la src/xenova.d.ts
sed -n '1,200p' src/xenova.d.ts | nl -baRepository: rohitg00/agentmemory
Length of output: 107
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Print local.ts with line numbers (no `nl` in this sandbox)
awk '{printf "%4d: %s\n", NR, $0}' src/providers/embedding/local.ts | sed -n '1,120p'
awk '{printf "%4d: %s\n", NR, $0}' src/providers/embedding/local.ts | sed -n '120,220p'
echo "---- src/xenova.d.ts ----"
awk '{printf "%4d: %s\n", NR, $0}' src/xenova.d.ts | sed -n '1,200p'Repository: rohitg00/agentmemory
Length of output: 2760
Simplify the cache/config comments and align TypeScript typings for @xenova/transformers env
transformers.env.localModelPathandtransformers.env.cacheDirare part of the documented@xenova/transformersv2.17.2envAPI, so the runtime property usage is reasonable.src/xenova.d.tscurrently only declarespipeline, so extend it to typeenv(and remove the//@ts-ignore``), instead of relying on an inlineenvtype in `src/providers/embedding/local.ts`.- The long comment block in
src/providers/embedding/local.ts(lines 49-55) explains behavior (“what”), which conflicts with the “don’t explain what” guideline; shorten it or restructure to rely on clearer naming. - Optional: if
XENOVA_CACHE_HOMEmay be provided as~/.cache/...literally, expand~viaos.homedir()before assigning totransformers.env.*.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/providers/embedding/local.ts` around lines 36 - 39, Extend the Xenova
types in src/xenova.d.ts to include env (with localModelPath and cacheDir)
alongside pipeline and remove the // `@ts-ignore` usage in
src/providers/embedding/local.ts; update the file to import/use the declared
types instead of an inline env type, shorten the long comment block to a concise
one-line note about configuring localModelPath/cacheDir, and (optionally)
normalize XENOVA_CACHE_HOME by expanding a leading "~" using os.homedir() before
assigning to transformers.env.cacheDir so paths are correct at runtime.
| // Pre-downloaded models (offline / restricted-network setups) live in | ||
| // ~/.cache/Xenova/ by convention. @xenova/transformers defaults | ||
| // localModelPath to its own install dir — which is deep inside npm's | ||
| // global node_modules and rarely holds pre-downloaded files. When | ||
| // XENOVA_CACHE_HOME is set, redirect both the local-model lookup and | ||
| // the download cache so the library finds existing files without a | ||
| // network fetch. | ||
| const cacheHome = process.env["XENOVA_CACHE_HOME"]; | ||
| if (cacheHome) { | ||
| transformers.env.localModelPath = cacheHome; | ||
| transformers.env.cacheDir = cacheHome; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Remove WHAT-comments per coding guidelines.
Lines 49-55 contain comments explaining what the code does, which violates the project guideline: "Do not use code comments explaining WHAT — use clear naming instead." The logic is simple enough that a well-named helper or clear variable names suffice.
♻️ Refactor to remove WHAT-comments
- // Pre-downloaded models (offline / restricted-network setups) live in
- // ~/.cache/Xenova/ by convention. `@xenova/transformers` defaults
- // localModelPath to its own install dir — which is deep inside npm's
- // global node_modules and rarely holds pre-downloaded files. When
- // XENOVA_CACHE_HOME is set, redirect both the local-model lookup and
- // the download cache so the library finds existing files without a
- // network fetch.
const cacheHome = process.env["XENOVA_CACHE_HOME"];
if (cacheHome) {
transformers.env.localModelPath = cacheHome;
transformers.env.cacheDir = cacheHome;
}If context is needed for future maintainers, consider a brief WHY comment explaining the user requirement (e.g., // Support offline setups via XENOVA_CACHE_HOME) rather than explaining the mechanics.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Pre-downloaded models (offline / restricted-network setups) live in | |
| // ~/.cache/Xenova/ by convention. @xenova/transformers defaults | |
| // localModelPath to its own install dir — which is deep inside npm's | |
| // global node_modules and rarely holds pre-downloaded files. When | |
| // XENOVA_CACHE_HOME is set, redirect both the local-model lookup and | |
| // the download cache so the library finds existing files without a | |
| // network fetch. | |
| const cacheHome = process.env["XENOVA_CACHE_HOME"]; | |
| if (cacheHome) { | |
| transformers.env.localModelPath = cacheHome; | |
| transformers.env.cacheDir = cacheHome; | |
| } | |
| const cacheHome = process.env["XENOVA_CACHE_HOME"]; | |
| if (cacheHome) { | |
| transformers.env.localModelPath = cacheHome; | |
| transformers.env.cacheDir = cacheHome; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/providers/embedding/local.ts` around lines 49 - 60, Remove the
explanatory "WHAT" comment block above the XENOVA cache logic and replace it
with either a concise WHY comment like "// Support offline setups via
XENOVA_CACHE_HOME" or no comment at all, leaving the code self-descriptive;
ensure the code around the cacheHome variable and assignments to
transformers.env.localModelPath and transformers.env.cacheDir (and the
XENOVA_CACHE_HOME env var name) remains unchanged so the intent is preserved
without the long HOW explanation.
Source: Coding guidelines
Problem
@xenova/transformersdefaultsenv.localModelPathto its own install directory — deep inside npm's globalnode_modules(e.g..../node_modules/@xenova/transformers/models/). When users pre-download embedding models to~/.cache/Xenova/— a common pattern in offline or restricted-network environments — the library never looks there.If HuggingFace CDN is unreachable (firewall, air-gapped network, etc.), every observation save logs:
[agentmemory] warn vector-index add: embed failed — skipping {"provider":"local","error":"fetch failed"}
This means every new observation lacks a vector embedding, and semantic search silently degrades to BM25-only.
Fix
In
LocalEmbeddingProvider.getExtractor(), readXENOVA_CACHE_HOMEfrom the environment after importing@xenova/transformers. When set, override bothenv.localModelPath(the primary file lookup path) andenv.cacheDir(the download cache destination) before callingpipeline().XENOVA_CACHE_HOMEnode_modules~/.cache/XenovaWhy
localModelPath+cacheDirbothlocalModelPathis the primary lookup (hub.js:392) — the library checks it firstcacheDiris where downloads are persisted — redirecting it keeps the npm directory cleanThe convention matches:
~/.cache/Xenova/already uses the{org}/{model}/layout that@xenova/transformersexpects, soXENOVA_CACHE_HOME=~/.cache/Xenovaworks directly with pre-downloaded models.Verification
XENOVA_CACHE_HOME=~/.cache/Xenova→ embed succeeds, 384-dim vector returnedenv.localModelPathunchanged, existing behavior preservedtest/embedding-provider.test.tsunaffected (no env var set → default path)Summary by CodeRabbit