Skip to content

Add Range Index data type based on Bf-Tree (single instance)#1613

Open
badrishc wants to merge 1 commit intodevfrom
badrishc/bf-tree-integration-plan
Open

Add Range Index data type based on Bf-Tree (single instance)#1613
badrishc wants to merge 1 commit intodevfrom
badrishc/bf-tree-integration-plan

Conversation

@badrishc
Copy link
Copy Markdown
Collaborator

@badrishc badrishc commented Mar 6, 2026

Description of Change

Adds Range Index as a new Garnet data type, backed by Bf-Tree — a high-performance B-tree for ordered key-value storage with range scan support. Gated behind --enable-range-index-preview.

Commands implemented (9):

  • RI.CREATE — Create index with DISK or MEMORY backend
  • RI.SET / RI.GET / RI.DEL — Point operations on entries
  • RI.SCAN / RI.RANGE — Range queries (DISK backend only)
  • RI.EXISTS / RI.CONFIG / RI.METRICS — Utility commands
  • TYPE returns "rangeindex" for RI keys
  • DEL / UNLINK frees the underlying BfTree

Lifecycle infrastructure (Tsavorite IRecordTriggers):

  • OnFlush — Snapshot BfTree to flush.bftree, set FlagFlushed for lazy promote
  • OnEvict — Free BfTree under exclusive lock
  • OnDiskRead — Zero stale TreeHandle
  • OnCheckpoint — VersionShift (barrier), FlushBegin (snapshot), CheckpointCompleted (cleanup)
  • OnRecovery / OnRecoverySnapshotRead — Set recovered token / FlagRecovered

Checkpoint consistency:

  • Trees with SnapshotPending=1 at barrier time are snapshotted; v+1 trees are skipped
  • Concurrent restores serialized via exclusive lock
  • Old checkpoint snapshots purged at CheckpointCompleted (gated on removeOutdated)
  • RIPROMOTE clears source handle in PostCopyUpdater (after CAS success, not before)
  • Restored trees use data.bftree as working file (not snapshot artifacts)

AOF logging:

  • RI.SET/RI.DEL logged via synthetic no-op RMW (VectorManager pattern)
  • RI.CREATE logged with stub bytes; replay creates fresh BfTree
  • AOF-only recovery (no checkpoint) supported

WRONGTYPE safety:

  • Read, RMW, and Upsert paths block cross-type access
  • GET/SET on RI keys → WRONGTYPE error
  • RI commands on non-RI keys → WRONGTYPE error
  • UpsertAction.WrongType added to Tsavorite

55 tests covering CRUD, scan, range, lifecycle (flush/evict/promote/restore), checkpoint recovery, AOF replay, AOF-only recovery, WRONGTYPE,
concurrent stress (4 threads + blocking SAVE with strict prefix verification), and utility commands.

Documentation:

  • Dedicated Range Index commands page (website/docs/commands/range-index.md)
  • API compatibility listing
  • Design doc updated with implementation status

Not in this PR (deferred)

  • Batch commands (RI.MSET/MGET/MDEL)
  • Cluster replication/migration
  • Transaction (MULTI/EXEC) support
  • Memory-only tree snapshot (blocked on upstream bf-tree)

Copy link
Copy Markdown
Contributor

@kevin-montrose kevin-montrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me - couple questions to poke at, but no blockers that I see.

Comment thread website/docs/dev/range-index-resp-api.md Outdated
Comment thread website/docs/dev/range-index-resp-api.md Outdated
Comment thread website/docs/dev/range-index-resp-api.md Outdated
@badrishc badrishc changed the title [Doc] Range index integration plan (first draft) Range index integration: design doc + prototype Mar 18, 2026
@badrishc badrishc force-pushed the badrishc/bf-tree-integration-plan branch from 320af88 to 90fbe55 Compare March 30, 2026 21:47
@badrishc badrishc force-pushed the badrishc/bf-tree-integration-plan branch 9 times, most recently from 8f93b4c to c5afb94 Compare April 15, 2026 02:11
@badrishc badrishc changed the title Range index integration: design doc + prototype Add Range Index data type based on Bf-Tree (single instance) Apr 16, 2026
@badrishc badrishc marked this pull request as ready for review April 16, 2026 01:23
Copilot AI review requested due to automatic review settings April 16, 2026 01:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new RangeIndex data type to Garnet, backed by the native bf-tree library, including command surface (RI.*), storage/session plumbing, lifecycle (flush/evict/checkpoint/recovery) hooks, and supporting metadata/docs/tests/benchmarks. This integrates RangeIndex into the main store with type-safety enforcement and adds a Rust cdylib + C# interop layer for BfTree.

Changes:

  • Introduces RangeIndex runtime support: RI.CREATE/SET/GET/DEL/SCAN/RANGE, RangeIndexManager, and session ops that execute against a BfTree native pointer stored in a Tsavorite value stub.
  • Extends Tsavorite trigger surface for checkpoint/recovery callbacks and wires Garnet record triggers to snapshot/evict/restore BfTrees.
  • Adds command metadata/docs updates plus new interop tests and benchmarks; updates CI/pipelines to install Rust and build/sign/package the native library.

Reviewed changes

Copilot reviewed 61 out of 66 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test/Garnet.test/TestUtils.cs Adds server option flag for enabling RangeIndex preview in tests.
test/Garnet.test/RespCommandTests.cs Marks internal RI RMW commands as “no metadata required”.
test/Garnet.test/Resp/ACL/RespCommandTests.cs Extends ACL coverage logic and adds RI.* ACL tests.
test/BfTreeInterop.test/BfTreeInteropTests.cs New integration tests for C# ↔ native BfTree interop.
test/BfTreeInterop.test/BfTreeInterop.test.csproj New test project for BfTree interop layer.
playground/CommandInfoUpdater/SupportedCommand.cs Adds RI.* commands to command info updater list.
playground/CommandInfoUpdater/GarnetCommandsInfo.json Adds command-info entries for RI.*.
playground/CommandInfoUpdater/GarnetCommandsDocs.json Adds docs entries for RI.* and updates quoting/encoding.
libs/storage/Tsavorite/cs/src/core/Index/StoreFunctions/StoreFunctions.cs Adds trigger passthrough for recovery/checkpoint callbacks.
libs/storage/Tsavorite/cs/src/core/Index/StoreFunctions/IStoreFunctions.cs Extends store functions interface for new callbacks.
libs/storage/Tsavorite/cs/src/core/Index/StoreFunctions/IRecordTriggers.cs Adds recovery and checkpoint trigger APIs to record triggers.
libs/storage/Tsavorite/cs/src/core/Index/StoreFunctions/CheckpointTrigger.cs New enum for checkpoint trigger points.
libs/storage/Tsavorite/cs/src/core/Index/Recovery/Recovery.cs Calls recovery triggers and distinguishes snapshot recovery path.
libs/storage/Tsavorite/cs/src/core/Index/Checkpointing/HybridLogCheckpointSMTask.cs Invokes checkpoint triggers at version-shift/flush-begin.
libs/storage/Tsavorite/cs/src/core/Allocator/ObjectAllocatorImpl.cs Clarifies flush trigger comment for external-resource snapshotting.
libs/server/StoreWrapper.cs Plumbs shared RangeIndexManager through StoreWrapper lifecycle.
libs/server/Storage/Session/MainStore/RangeIndexOps.cs Implements StorageSession RangeIndex operations and RESP writes.
libs/server/Storage/Functions/UnifiedStore/DeleteMethods.cs Minor formatting change.
libs/server/Storage/Functions/MainStore/VarLenInputMethods.cs Adds RI* sizing support for RMW value allocation.
libs/server/Storage/Functions/MainStore/ReadMethods.cs Enforces type-safety for RI keys on read paths.
libs/server/Storage/Functions/MainStore/RMWMethods.cs Adds RI* RMW handling for create/promote/restore stubs.
libs/server/Storage/Functions/MainStore/PrivateMethods.cs Ensures RI.* commands copy output without RESP header.
libs/server/Storage/Functions/GarnetRecordTriggers.cs Adds BfTree lifecycle handling via record triggers.
libs/server/Storage/Functions/FunctionsState.cs Exposes RangeIndexManager via FunctionsState.
libs/server/Servers/GarnetServerOptions.cs Adds EnableRangeIndexPreview server option.
libs/server/Resp/RespServerSession.cs Routes RI.* commands to dedicated handlers.
libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs New RESP handlers for RI.* parsing/execution.
libs/server/Resp/RangeIndex/RangeIndexResult.cs New result enum for RangeIndex operations.
libs/server/Resp/RangeIndex/RangeIndexManager.cs New manager for live BfTrees + flush/checkpoint snapshot orchestration.
libs/server/Resp/RangeIndex/RangeIndexManager.Locking.cs Adds RI shared/exclusive locking + lazy restore/promote paths.
libs/server/Resp/RangeIndex/RangeIndexManager.Index.cs Defines stub layout and stub mutation helpers.
libs/server/Resp/Parser/RespCommand.cs Adds RI commands, parsing fast-paths, and RangeIndex legality checks.
libs/server/Resp/CmdStrings.cs Adds RI.* command string constants.
libs/server/GarnetDatabase.cs Stores RangeIndexManager per DB; plumbs through DB construction.
libs/server/Garnet.server.csproj Adds project reference to BfTreeInterop.
libs/server/API/IGarnetApi.cs Adds RangeIndex API surface methods.
libs/server/API/GarnetApi.cs Implements RangeIndex API calls via StorageSession.
libs/server/ACL/ACLParser.cs Supports dot-commands (e.g., RI.CREATE) parsing for ACLs.
libs/resources/RespCommandsInfo.json Adds RI.* command info entries to shipped resources.
libs/native/bftree-garnet/src/lib.rs New Rust FFI wrapper over bf-tree with point ops, scans, snapshot/recovery.
libs/native/bftree-garnet/rust-toolchain.toml Pins Rust toolchain channel to stable.
libs/native/bftree-garnet/examples/bench.rs Adds standalone Rust benchmark example.
libs/native/bftree-garnet/NativeBfTreeMethods.cs Adds P/Invoke declarations for native bftree_garnet.
libs/native/bftree-garnet/Cargo.toml New Rust crate manifest for cdylib build.
libs/native/bftree-garnet/Cargo.lock Locks Rust dependencies (bf-tree, transitive).
libs/native/bftree-garnet/BfTreeService.cs Managed wrapper for BfTree native API and scan helpers.
libs/native/bftree-garnet/BfTreeInterop.csproj Builds/copies native library and packages runtime assets.
libs/native/bftree-garnet/.gitignore Ignores Rust build artifacts.
libs/host/defaults.conf Adds default config for RangeIndex preview flag.
libs/host/GarnetServer.cs Constructs RangeIndexManager and wires triggers into store creation.
libs/host/Configuration/Options.cs Adds CLI option --enable-range-index-preview.
benchmark/BDN.benchmark/Operations/RangeIndexOperations.cs Adds BDN benchmarks for RI.* operations.
benchmark/BDN.benchmark/BfTree/BfTreeOperations.cs Adds BDN benchmarks for BfTree point + scan operations.
benchmark/BDN.benchmark/BDN.benchmark.csproj References BfTreeInterop from benchmarks.
Garnet.slnx Adds new project/folders to solution layout.
.github/workflows/ci.yml Installs Rust toolchain for GitHub Actions CI.
.azure/pipelines/createbinaries.ps1 Copies bftree native binary into publish output.
.azure/pipelines/azure-pipelines.yml Installs Rust toolchain on Windows/Linux builds.
.azure/pipelines/azure-pipelines-internal-release.yml Builds/downloads bftree native and signs additional artifacts.
.azure/pipelines/azure-pipelines-external-release.yml Builds/downloads bftree native and includes in release signing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/server/Storage/Session/MainStore/RangeIndexOps.cs
Comment thread libs/server/Storage/Session/MainStore/RangeIndexOps.cs Outdated
Comment thread libs/server/Storage/Session/MainStore/RangeIndexOps.cs Outdated
Comment thread libs/native/bftree-garnet/BfTreeService.cs Outdated
Comment thread benchmark/BDN.benchmark/Operations/RangeIndexOperations.cs Outdated
Comment thread libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs
Comment thread playground/CommandInfoUpdater/GarnetCommandsDocs.json Outdated
Comment thread libs/native/bftree-garnet/BfTreeInterop.csproj Outdated
@badrishc badrishc force-pushed the badrishc/bf-tree-integration-plan branch 3 times, most recently from c207468 to 9ac1c64 Compare April 16, 2026 02:39
@badrishc badrishc requested a review from Copilot April 16, 2026 02:42
@badrishc badrishc force-pushed the badrishc/bf-tree-integration-plan branch from 9ac1c64 to 3e67835 Compare April 16, 2026 03:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@badrishc badrishc force-pushed the badrishc/bf-tree-integration-plan branch 4 times, most recently from 89856a2 to d602ae0 Compare April 16, 2026 18:24
@badrishc badrishc requested a review from Copilot April 16, 2026 18:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 72 out of 77 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (8)

libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs:1

  • RI.GET returns early on WRONGTYPE without writing a WRONGTYPE error payload (it just flushes the buffer). This will produce an empty/incorrect response on the wire. Write CmdStrings.RESP_ERR_WRONG_TYPE (as other RI handlers do) before returning.
    libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs:1
  • RI.DEL has the same WRONGTYPE handling issue as RI.GET: it returns without emitting an error response. Align with the other handlers by writing CmdStrings.RESP_ERR_WRONG_TYPE and returning, rather than calling SendAndReset() alone.
    libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs:1
  • RI.CREATE parses numeric options via GetLong() and then casts to ulong/uint. Negative inputs will underflow to very large values and bypass the == 0 validation, potentially causing extreme allocations or undefined behavior in the native layer. Reject negative values explicitly (e.g., parse into long, validate > 0 and within uint/ulong bounds) before casting.
    libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs:1
  • RI.CREATE parses numeric options via GetLong() and then casts to ulong/uint. Negative inputs will underflow to very large values and bypass the == 0 validation, potentially causing extreme allocations or undefined behavior in the native layer. Reject negative values explicitly (e.g., parse into long, validate > 0 and within uint/ulong bounds) before casting.
    libs/server/Resp/RangeIndex/RangeIndexManager.cs:1
  • XxHash128.Hash(...) returns a Hash128 struct (System.IO.Hashing), but Guid does not have a constructor that accepts Hash128. This is likely a compile error. Convert the hash to bytes (e.g., via hash.ToByteArray() or writing into a 16-byte span) and pass the resulting 16 bytes to Guid.
    libs/server/Resp/RangeIndex/RangeIndexManager.Index.cs:1
  • The comment says TreeHandle is a “managed object handle”, but throughout the implementation it’s treated as a native pointer (BfTreeService.NativePtr / bftree_* APIs). Please update the comment to reflect that it’s a native pointer to the underlying bf-tree instance.
    libs/server/Resp/RangeIndex/RangeIndexManager.cs:1
  • PurgeOldCheckpointSnapshots does a recursive EnumerateFiles(..., AllDirectories) over the entire rangeindex directory on every checkpoint completion. As the number of RI keys grows, this becomes an O(total files) operation per checkpoint. A more scalable approach is to track per-index directories (e.g., from liveIndexes + known key dirs) and only scan those, or store the last token per key dir to avoid global recursion.
    libs/native/bftree-garnet/BfTreeService.cs:1
  • Using [0] as the “minimum key” will skip an empty key ("") if bf-tree allows it (and the Garnet RI docs/examples indicate empty start keys are valid). If empty keys are allowed, ScanAll should start from an empty span/array (length 0). If empty keys are not supported, the docs/validation should explicitly state that.
// Copyright (c) Microsoft Corporation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/server/ACL/ACLParser.cs
@badrishc badrishc force-pushed the badrishc/bf-tree-integration-plan branch 8 times, most recently from d840e41 to dca2823 Compare April 16, 2026 23:32
…E support (preview)

Adds Range Index as a new Garnet data type backed by Bf-Tree, a high-performance
B-tree for ordered key-value storage with range scan support. Gated behind
--enable-range-index-preview. Not supported in cluster mode.

Commands: RI.CREATE, RI.SET, RI.GET, RI.DEL, RI.SCAN, RI.RANGE, RI.EXISTS,
RI.CONFIG, RI.METRICS. TYPE returns 'rangeindex'. DEL/UNLINK frees the tree.

Lifecycle (Tsavorite IRecordTriggers):
- OnFlush: snapshot BfTree to flush.bftree, set FlagFlushed for lazy promote
- OnEvict: free BfTree under exclusive lock
- OnDiskRead: zero stale TreeHandle
- OnCheckpoint: barrier at VersionShift, snapshot at FlushBegin (SnapshotPending
  filtering skips v+1 trees), cleanup at CheckpointCompleted
- OnRecovery/OnRecoverySnapshotRead: set recovered token and FlagRecovered
- Lazy promote (RIPROMOTE): CopyUpdater copies stub to tail, PostCopyUpdater
  clears source handle after CAS success
- Lazy restore: exclusive lock, re-read stub, restore from checkpoint or flush
  snapshot, copy to data.bftree working path

Checkpoint consistency:
- Only trees with SnapshotPending=1 are snapshotted; v+1 trees skipped
- Snapshot failures are fatal (propagate to state machine driver)
- MEMORY trees skipped (snapshot not supported by native library)
- Old checkpoint snapshots purged at CheckpointCompleted (gated on removeOutdated)
- RangeIndex paths namespaced by DB ID for multi-database isolation

AOF logging:
- RI.SET/RI.DEL: synthetic no-op RMW triggers AOF entry; replay via
  HandleRangeIndexSetReplay/DelReplay in AofProcessor
- RI.CREATE: stub bytes logged; replay creates fresh BfTree, replaces
  stale handle, then RMW proceeds
- AOF-only recovery (no checkpoint) supported

WRONGTYPE safety:
- ReadMethods/RMWMethods: bidirectional type checks (RI cmd on non-RI key
  and non-RI cmd on RI key)
- UpsertMethods: InPlaceWriter blocks SET on RI/Vector stubs
  (UpsertAction.WrongType + WRONG_TYPE OperationStatus added to Tsavorite)
- All 9 RI RESP handlers check and propagate WRONGTYPE

Tsavorite infrastructure:
- CheckpointTrigger.CheckpointCompleted enum value
- UpsertAction.WrongType + OperationStatus.WRONG_TYPE
- ClearBitsOnPage: OnDiskRead/OnRecoverySnapshotRead scoped to correct
  address ranges (no double-call on boundary page)

55 tests covering CRUD, scan, range, lifecycle, checkpoint recovery,
AOF replay, AOF-only recovery, WRONGTYPE, concurrent stress (4 threads +
blocking SAVE with strict prefix verification), and utility commands.

Documentation: dedicated Range Index commands page, API compatibility
listing, design doc updated with implementation status.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@badrishc badrishc force-pushed the badrishc/bf-tree-integration-plan branch from dca2823 to 8290dd2 Compare April 16, 2026 23:36
@@ -0,0 +1,2 @@
[toolchain]
channel = "stable"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: thoughts on using a hardcoded version like diskann repo does?
https://github.com/microsoft/DiskANN/blob/main/rust-toolchain.toml

That way we can have reproducibility between builds/releases that don't change rust version

- bash: |
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable
source $HOME/.cargo/env
cargo build --release --manifest-path libs/native/bftree-garnet/Cargo.toml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we add --locked so Cargo.lock file is used and then we have more reproducible builds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants