Skip to content

Allow XENOVA_CACHE_HOME to redirect local embedding model path#862

Open
canxer314 wants to merge 1 commit into
rohitg00:mainfrom
canxer314:feature/xenova-cache-home
Open

Allow XENOVA_CACHE_HOME to redirect local embedding model path#862
canxer314 wants to merge 1 commit into
rohitg00:mainfrom
canxer314:feature/xenova-cache-home

Conversation

@canxer314

@canxer314 canxer314 commented Jun 8, 2026

Copy link
Copy Markdown

Problem

@xenova/transformers defaults env.localModelPath to its own install directory — deep inside npm's global node_modules (e.g. .../node_modules/@xenova/transformers/models/). When users pre-download embedding models to ~/.cache/Xenova/ — a common pattern in offline or restricted-network environments — the library never looks there.

If HuggingFace CDN is unreachable (firewall, air-gapped network, etc.), every observation save logs:

[agentmemory] warn vector-index add: embed failed — skipping {"provider":"local","error":"fetch failed"}

This means every new observation lacks a vector embedding, and semantic search silently degrades to BM25-only.

Fix

In LocalEmbeddingProvider.getExtractor(), read XENOVA_CACHE_HOME from the environment after importing @xenova/transformers. When set, override both env.localModelPath (the primary file lookup path) and env.cacheDir (the download cache destination) before calling pipeline().

XENOVA_CACHE_HOME Behavior
unset Unchanged — uses default paths inside npm node_modules
~/.cache/Xenova Finds pre-downloaded models directly, no network fetch needed

Why localModelPath + cacheDir both

  • localModelPath is the primary lookup (hub.js:392) — the library checks it first
  • cacheDir is where downloads are persisted — redirecting it keeps the npm directory clean

The convention matches: ~/.cache/Xenova/ already uses the {org}/{model}/ layout that @xenova/transformers expects, so XENOVA_CACHE_HOME=~/.cache/Xenova works directly with pre-downloaded models.

Verification

  • XENOVA_CACHE_HOME=~/.cache/Xenova → embed succeeds, 384-dim vector returned
  • env var unset → env.localModelPath unchanged, existing behavior preserved
  • test/embedding-provider.test.ts unaffected (no env var set → default path)

Summary by CodeRabbit

  • New Features
    • Added support for offline and restricted-network setups, enabling use of pre-downloaded models without network access via environment variable configuration.

@xenova/transformers defaults localModelPath to its own install dir
inside npm global node_modules. When users pre-download models to
~/.cache/Xenova/ (offline or restricted-network environments),
the library cannot find them and tries to fetch from huggingface.co.
If that network is unreachable, every observation save logs:

  vector-index add: embed failed - skipping (provider: local)

Read XENOVA_CACHE_HOME from the environment and, when set, override
both env.localModelPath and env.cacheDir before the pipeline() call.
Unset = zero behavior change.
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

@canxer314 is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

LocalEmbeddingProvider now supports offline model loading via the XENOVA_CACHE_HOME environment variable. When set, this variable redirects both the Xenova local model lookup path and transformer cache directory, enabling pre-downloaded model usage without network access in restricted environments.

Changes

Offline model cache configuration

Layer / File(s) Summary
Xenova cache home environment variable support
src/providers/embedding/local.ts
The transformers module type is expanded to expose env object with localModelPath and cacheDir fields. getExtractor() now reads XENOVA_CACHE_HOME and applies it to both paths when the environment variable is set.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A local model hops into place,
No network needed for this cache—
Xenova whispers through the env,
Offline magic, oh so brave!
thump thump

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Allow XENOVA_CACHE_HOME to redirect local embedding model path' directly and accurately describes the main change: enabling the XENOVA_CACHE_HOME environment variable to redirect the embedding model lookup path.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/providers/embedding/local.ts`:
- Around line 49-60: Remove the explanatory "WHAT" comment block above the
XENOVA cache logic and replace it with either a concise WHY comment like "//
Support offline setups via XENOVA_CACHE_HOME" or no comment at all, leaving the
code self-descriptive; ensure the code around the cacheHome variable and
assignments to transformers.env.localModelPath and transformers.env.cacheDir
(and the XENOVA_CACHE_HOME env var name) remains unchanged so the intent is
preserved without the long HOW explanation.
- Around line 36-39: Extend the Xenova types in src/xenova.d.ts to include env
(with localModelPath and cacheDir) alongside pipeline and remove the //
`@ts-ignore` usage in src/providers/embedding/local.ts; update the file to
import/use the declared types instead of an inline env type, shorten the long
comment block to a concise one-line note about configuring
localModelPath/cacheDir, and (optionally) normalize XENOVA_CACHE_HOME by
expanding a leading "~" using os.homedir() before assigning to
transformers.env.cacheDir so paths are correct at runtime.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9295cfe-5f6c-459d-abd0-82c2fbbf3122

📥 Commits

Reviewing files that changed from the base of the PR and between 749c280 and 0b93ba7.

📒 Files selected for processing (1)
  • src/providers/embedding/local.ts

Comment on lines +36 to +39
let transformers: {
pipeline: Pipeline;
env: { localModelPath: string; cacheDir: string };
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify `@xenova/transformers` package version and check for env property documentation

# Check installed version
npm list `@xenova/transformers` 2>/dev/null || echo "Package not installed in current workspace"

# Search for any existing references to transformers.env in the codebase
rg -n 'transformers\.env' --type=ts -C2

Repository: rohitg00/agentmemory

Length of output: 567


🌐 Web query:

@xenova/transformers version 2.17.2 env.localModelPath env.cacheDir configuration API documentation

💡 Result:

In @xenova/transformers version 2.17.2, the env object is the central configuration API used to control the library's behavior, including model loading and caching [1][2]. env.localModelPath This property specifies the base path from which local models are loaded [1][2]. It defaults to '/models/' [1][2]. You can configure it as follows: import { env } from '@xenova/transformers'; env.localModelPath = '/path/to/local/models/'; env.cacheDir This property defines the directory used for caching files when the file system (FSCache) is utilized [1][2]. By default, it is set to './.cache' [1][2]. You can configure it as follows: import { env } from '@xenova/transformers'; env.cacheDir = '/path/to/cache/directory/'; Additional configuration notes: - To enable local model loading, ensure env.allowLocalModels is set to true (which is typically the default) [1][2]. - For caching, the library uses both browser and file system caches depending on the environment [1][2]. You can control these using boolean flags such as env.useBrowserCache and env.useFSCache [1][2]. - The env object also exposes backend-specific configurations, such as the location of WASM files via env.backends.onnx.wasm.wasmPaths [3][4].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the relevant sections of local.ts around the env type + assignments
sed -n '1,120p' src/providers/embedding/local.ts | nl -ba | sed -n '25,95p'

# Show xenova.d.ts declarations for pipeline (and whether env is declared)
ls -la src/xenova.d.ts
sed -n '1,200p' src/xenova.d.ts | nl -ba

Repository: rohitg00/agentmemory

Length of output: 107


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Print local.ts with line numbers (no `nl` in this sandbox)
awk '{printf "%4d: %s\n", NR, $0}' src/providers/embedding/local.ts | sed -n '1,120p'
awk '{printf "%4d: %s\n", NR, $0}' src/providers/embedding/local.ts | sed -n '120,220p'

echo "---- src/xenova.d.ts ----"
awk '{printf "%4d: %s\n", NR, $0}' src/xenova.d.ts | sed -n '1,200p'

Repository: rohitg00/agentmemory

Length of output: 2760


Simplify the cache/config comments and align TypeScript typings for @xenova/transformers env

  • transformers.env.localModelPath and transformers.env.cacheDir are part of the documented @xenova/transformers v2.17.2 env API, so the runtime property usage is reasonable.
  • src/xenova.d.ts currently only declares pipeline, so extend it to type env (and remove the // @ts-ignore``), instead of relying on an inline env type in `src/providers/embedding/local.ts`.
  • The long comment block in src/providers/embedding/local.ts (lines 49-55) explains behavior (“what”), which conflicts with the “don’t explain what” guideline; shorten it or restructure to rely on clearer naming.
  • Optional: if XENOVA_CACHE_HOME may be provided as ~/.cache/... literally, expand ~ via os.homedir() before assigning to transformers.env.*.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/embedding/local.ts` around lines 36 - 39, Extend the Xenova
types in src/xenova.d.ts to include env (with localModelPath and cacheDir)
alongside pipeline and remove the // `@ts-ignore` usage in
src/providers/embedding/local.ts; update the file to import/use the declared
types instead of an inline env type, shorten the long comment block to a concise
one-line note about configuring localModelPath/cacheDir, and (optionally)
normalize XENOVA_CACHE_HOME by expanding a leading "~" using os.homedir() before
assigning to transformers.env.cacheDir so paths are correct at runtime.

Comment on lines +49 to +60
// Pre-downloaded models (offline / restricted-network setups) live in
// ~/.cache/Xenova/ by convention. @xenova/transformers defaults
// localModelPath to its own install dir — which is deep inside npm's
// global node_modules and rarely holds pre-downloaded files. When
// XENOVA_CACHE_HOME is set, redirect both the local-model lookup and
// the download cache so the library finds existing files without a
// network fetch.
const cacheHome = process.env["XENOVA_CACHE_HOME"];
if (cacheHome) {
transformers.env.localModelPath = cacheHome;
transformers.env.cacheDir = cacheHome;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Remove WHAT-comments per coding guidelines.

Lines 49-55 contain comments explaining what the code does, which violates the project guideline: "Do not use code comments explaining WHAT — use clear naming instead." The logic is simple enough that a well-named helper or clear variable names suffice.

♻️ Refactor to remove WHAT-comments
-    // Pre-downloaded models (offline / restricted-network setups) live in
-    // ~/.cache/Xenova/ by convention. `@xenova/transformers` defaults
-    // localModelPath to its own install dir — which is deep inside npm's
-    // global node_modules and rarely holds pre-downloaded files. When
-    // XENOVA_CACHE_HOME is set, redirect both the local-model lookup and
-    // the download cache so the library finds existing files without a
-    // network fetch.
     const cacheHome = process.env["XENOVA_CACHE_HOME"];
     if (cacheHome) {
       transformers.env.localModelPath = cacheHome;
       transformers.env.cacheDir = cacheHome;
     }

If context is needed for future maintainers, consider a brief WHY comment explaining the user requirement (e.g., // Support offline setups via XENOVA_CACHE_HOME) rather than explaining the mechanics.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Pre-downloaded models (offline / restricted-network setups) live in
// ~/.cache/Xenova/ by convention. @xenova/transformers defaults
// localModelPath to its own install dir — which is deep inside npm's
// global node_modules and rarely holds pre-downloaded files. When
// XENOVA_CACHE_HOME is set, redirect both the local-model lookup and
// the download cache so the library finds existing files without a
// network fetch.
const cacheHome = process.env["XENOVA_CACHE_HOME"];
if (cacheHome) {
transformers.env.localModelPath = cacheHome;
transformers.env.cacheDir = cacheHome;
}
const cacheHome = process.env["XENOVA_CACHE_HOME"];
if (cacheHome) {
transformers.env.localModelPath = cacheHome;
transformers.env.cacheDir = cacheHome;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/embedding/local.ts` around lines 49 - 60, Remove the
explanatory "WHAT" comment block above the XENOVA cache logic and replace it
with either a concise WHY comment like "// Support offline setups via
XENOVA_CACHE_HOME" or no comment at all, leaving the code self-descriptive;
ensure the code around the cacheHome variable and assignments to
transformers.env.localModelPath and transformers.env.cacheDir (and the
XENOVA_CACHE_HOME env var name) remains unchanged so the intent is preserved
without the long HOW explanation.

Source: Coding guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant