Skip to content

feat(caching, vector database): implement local vector cache with incremental indexing & title blending#12

Merged
HahaBill merged 1 commit into
masterfrom
feat/caching-indexing
Jun 5, 2026
Merged

feat(caching, vector database): implement local vector cache with incremental indexing & title blending#12
HahaBill merged 1 commit into
masterfrom
feat/caching-indexing

Conversation

@Harsh16gupta

@Harsh16gupta Harsh16gupta commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

This PR implements the local vector caching layer and incremental note indexing pipeline for the Joplin Note Categorization Plugin. It optimizes the embedding process by avoiding redundant model inference calls for notes that have not changed.


Changes

  • Vector Aggregation & Title Weighting: Combines chunk vectors and dynamically blends note titles based on similarity.
  • Local Caching (vectra): Uses a local vector database stored inside Joplin's data directory.
  • SHA-256 Content Hashing: Computes title + '\n\n' + body hashes to identify note modifications.
  • Cache Sync & Deletion: Cleans up obsolete/deleted notes from the database on startup.
  • Webpack Externals: Configured externals in webpack to resolve native binary packaging errors from onnxruntime-node.

Verification & Test cases

Below are the logs and results for the four key verification scenarios:

  1. Scenario A: First Run (Cache Creation) - Sequentially embeds all notes from scratch and populates the cache.
image
  1. Scenario B: Second Run (Cache Hits / Bypass) - Finishes sub-second by completely bypassing the embedding model.
image
  1. Scenario C: Modified Note (Incremental Indexing) - Only embeds the single modified note; rest load from cache.
image
  1. Scenario D: Deleted Note (Cleanup) - Detects and cleans up deleted notes from the index.
image

@Harsh16gupta Harsh16gupta changed the title Add note vector aggregation with title weighting and local caching feat(caching, vector database): implement local vector cache with incremental indexing & title blending Jun 3, 2026
@Harsh16gupta Harsh16gupta marked this pull request as ready for review June 3, 2026 08:17
@Harsh16gupta Harsh16gupta self-assigned this Jun 3, 2026
@Harsh16gupta Harsh16gupta force-pushed the feat/caching-indexing branch from 0c8af05 to a68c506 Compare June 3, 2026 08:26
@Harsh16gupta Harsh16gupta requested a review from HahaBill June 3, 2026 18:32
@HahaBill

HahaBill commented Jun 4, 2026

Copy link
Copy Markdown
Member

It seems like this branch is based on feat/vector-aggregation. Will wait for the #8 to be resolved first, should be a quick change

@HahaBill

HahaBill commented Jun 4, 2026

Copy link
Copy Markdown
Member

Resolved merge and declaration conflicts 👍 Ready to review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a local vector caching layer (backed by Vectra) and updates the embedding/indexing pipeline to skip redundant inference work by hashing note content and reusing cached vectors. It also adjusts Webpack configuration to work around native-binary packaging issues stemming from transitive imports.

Changes:

  • Added a VectorCache wrapper around vectra’s LocalIndex, including SHA-256 content hashing and basic CRUD operations.
  • Updated the embedding test command to support incremental indexing (cache hits), deletions cleanup, and persisted cache upserts.
  • Updated Webpack bundling and project dependencies to integrate vectra and avoid bundling native onnxruntime-node artifacts.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
webpack.config.js Adds Webpack externals to avoid traversing onnxruntime-node/@huggingface/transformers from vectra exports; updates archive build config.
src/pipeline/vectorCache.ts Introduces a local vector cache abstraction using vectra + SHA-256 hashing.
src/commands/testEmbed.ts Adds cache hit bypass, incremental indexing, and deletion cleanup to the embedding pipeline.
package.json Adds vectra dependency.
package-lock.json Locks vectra and its transitive dependencies (including engine/peer constraints).
.npmrc Enables legacy-peer-deps to bypass peer dependency resolution failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread package.json
Comment thread .npmrc Outdated
@Harsh16gupta Harsh16gupta force-pushed the feat/caching-indexing branch from 5a37111 to 949448f Compare June 4, 2026 17:48

@HahaBill HahaBill left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems to look good for now. Thank you for this PR :)

@HahaBill HahaBill merged commit 068a70f into master Jun 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants