Add Range Index data type based on Bf-Tree (single instance)#1613
Add Range Index data type based on Bf-Tree (single instance)#1613
Conversation
kevin-montrose
left a comment
There was a problem hiding this comment.
Generally looks good to me - couple questions to poke at, but no blockers that I see.
320af88 to
90fbe55
Compare
8f93b4c to
c5afb94
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new RangeIndex data type to Garnet, backed by the native bf-tree library, including command surface (RI.*), storage/session plumbing, lifecycle (flush/evict/checkpoint/recovery) hooks, and supporting metadata/docs/tests/benchmarks. This integrates RangeIndex into the main store with type-safety enforcement and adds a Rust cdylib + C# interop layer for BfTree.
Changes:
- Introduces RangeIndex runtime support:
RI.CREATE/SET/GET/DEL/SCAN/RANGE, RangeIndexManager, and session ops that execute against a BfTree native pointer stored in a Tsavorite value stub. - Extends Tsavorite trigger surface for checkpoint/recovery callbacks and wires Garnet record triggers to snapshot/evict/restore BfTrees.
- Adds command metadata/docs updates plus new interop tests and benchmarks; updates CI/pipelines to install Rust and build/sign/package the native library.
Reviewed changes
Copilot reviewed 61 out of 66 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Garnet.test/TestUtils.cs | Adds server option flag for enabling RangeIndex preview in tests. |
| test/Garnet.test/RespCommandTests.cs | Marks internal RI RMW commands as “no metadata required”. |
| test/Garnet.test/Resp/ACL/RespCommandTests.cs | Extends ACL coverage logic and adds RI.* ACL tests. |
| test/BfTreeInterop.test/BfTreeInteropTests.cs | New integration tests for C# ↔ native BfTree interop. |
| test/BfTreeInterop.test/BfTreeInterop.test.csproj | New test project for BfTree interop layer. |
| playground/CommandInfoUpdater/SupportedCommand.cs | Adds RI.* commands to command info updater list. |
| playground/CommandInfoUpdater/GarnetCommandsInfo.json | Adds command-info entries for RI.*. |
| playground/CommandInfoUpdater/GarnetCommandsDocs.json | Adds docs entries for RI.* and updates quoting/encoding. |
| libs/storage/Tsavorite/cs/src/core/Index/StoreFunctions/StoreFunctions.cs | Adds trigger passthrough for recovery/checkpoint callbacks. |
| libs/storage/Tsavorite/cs/src/core/Index/StoreFunctions/IStoreFunctions.cs | Extends store functions interface for new callbacks. |
| libs/storage/Tsavorite/cs/src/core/Index/StoreFunctions/IRecordTriggers.cs | Adds recovery and checkpoint trigger APIs to record triggers. |
| libs/storage/Tsavorite/cs/src/core/Index/StoreFunctions/CheckpointTrigger.cs | New enum for checkpoint trigger points. |
| libs/storage/Tsavorite/cs/src/core/Index/Recovery/Recovery.cs | Calls recovery triggers and distinguishes snapshot recovery path. |
| libs/storage/Tsavorite/cs/src/core/Index/Checkpointing/HybridLogCheckpointSMTask.cs | Invokes checkpoint triggers at version-shift/flush-begin. |
| libs/storage/Tsavorite/cs/src/core/Allocator/ObjectAllocatorImpl.cs | Clarifies flush trigger comment for external-resource snapshotting. |
| libs/server/StoreWrapper.cs | Plumbs shared RangeIndexManager through StoreWrapper lifecycle. |
| libs/server/Storage/Session/MainStore/RangeIndexOps.cs | Implements StorageSession RangeIndex operations and RESP writes. |
| libs/server/Storage/Functions/UnifiedStore/DeleteMethods.cs | Minor formatting change. |
| libs/server/Storage/Functions/MainStore/VarLenInputMethods.cs | Adds RI* sizing support for RMW value allocation. |
| libs/server/Storage/Functions/MainStore/ReadMethods.cs | Enforces type-safety for RI keys on read paths. |
| libs/server/Storage/Functions/MainStore/RMWMethods.cs | Adds RI* RMW handling for create/promote/restore stubs. |
| libs/server/Storage/Functions/MainStore/PrivateMethods.cs | Ensures RI.* commands copy output without RESP header. |
| libs/server/Storage/Functions/GarnetRecordTriggers.cs | Adds BfTree lifecycle handling via record triggers. |
| libs/server/Storage/Functions/FunctionsState.cs | Exposes RangeIndexManager via FunctionsState. |
| libs/server/Servers/GarnetServerOptions.cs | Adds EnableRangeIndexPreview server option. |
| libs/server/Resp/RespServerSession.cs | Routes RI.* commands to dedicated handlers. |
| libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs | New RESP handlers for RI.* parsing/execution. |
| libs/server/Resp/RangeIndex/RangeIndexResult.cs | New result enum for RangeIndex operations. |
| libs/server/Resp/RangeIndex/RangeIndexManager.cs | New manager for live BfTrees + flush/checkpoint snapshot orchestration. |
| libs/server/Resp/RangeIndex/RangeIndexManager.Locking.cs | Adds RI shared/exclusive locking + lazy restore/promote paths. |
| libs/server/Resp/RangeIndex/RangeIndexManager.Index.cs | Defines stub layout and stub mutation helpers. |
| libs/server/Resp/Parser/RespCommand.cs | Adds RI commands, parsing fast-paths, and RangeIndex legality checks. |
| libs/server/Resp/CmdStrings.cs | Adds RI.* command string constants. |
| libs/server/GarnetDatabase.cs | Stores RangeIndexManager per DB; plumbs through DB construction. |
| libs/server/Garnet.server.csproj | Adds project reference to BfTreeInterop. |
| libs/server/API/IGarnetApi.cs | Adds RangeIndex API surface methods. |
| libs/server/API/GarnetApi.cs | Implements RangeIndex API calls via StorageSession. |
| libs/server/ACL/ACLParser.cs | Supports dot-commands (e.g., RI.CREATE) parsing for ACLs. |
| libs/resources/RespCommandsInfo.json | Adds RI.* command info entries to shipped resources. |
| libs/native/bftree-garnet/src/lib.rs | New Rust FFI wrapper over bf-tree with point ops, scans, snapshot/recovery. |
| libs/native/bftree-garnet/rust-toolchain.toml | Pins Rust toolchain channel to stable. |
| libs/native/bftree-garnet/examples/bench.rs | Adds standalone Rust benchmark example. |
| libs/native/bftree-garnet/NativeBfTreeMethods.cs | Adds P/Invoke declarations for native bftree_garnet. |
| libs/native/bftree-garnet/Cargo.toml | New Rust crate manifest for cdylib build. |
| libs/native/bftree-garnet/Cargo.lock | Locks Rust dependencies (bf-tree, transitive). |
| libs/native/bftree-garnet/BfTreeService.cs | Managed wrapper for BfTree native API and scan helpers. |
| libs/native/bftree-garnet/BfTreeInterop.csproj | Builds/copies native library and packages runtime assets. |
| libs/native/bftree-garnet/.gitignore | Ignores Rust build artifacts. |
| libs/host/defaults.conf | Adds default config for RangeIndex preview flag. |
| libs/host/GarnetServer.cs | Constructs RangeIndexManager and wires triggers into store creation. |
| libs/host/Configuration/Options.cs | Adds CLI option --enable-range-index-preview. |
| benchmark/BDN.benchmark/Operations/RangeIndexOperations.cs | Adds BDN benchmarks for RI.* operations. |
| benchmark/BDN.benchmark/BfTree/BfTreeOperations.cs | Adds BDN benchmarks for BfTree point + scan operations. |
| benchmark/BDN.benchmark/BDN.benchmark.csproj | References BfTreeInterop from benchmarks. |
| Garnet.slnx | Adds new project/folders to solution layout. |
| .github/workflows/ci.yml | Installs Rust toolchain for GitHub Actions CI. |
| .azure/pipelines/createbinaries.ps1 | Copies bftree native binary into publish output. |
| .azure/pipelines/azure-pipelines.yml | Installs Rust toolchain on Windows/Linux builds. |
| .azure/pipelines/azure-pipelines-internal-release.yml | Builds/downloads bftree native and signs additional artifacts. |
| .azure/pipelines/azure-pipelines-external-release.yml | Builds/downloads bftree native and includes in release signing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c207468 to
9ac1c64
Compare
9ac1c64 to
3e67835
Compare
89856a2 to
d602ae0
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 72 out of 77 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (8)
libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs:1
RI.GETreturns early on WRONGTYPE without writing a WRONGTYPE error payload (it just flushes the buffer). This will produce an empty/incorrect response on the wire. WriteCmdStrings.RESP_ERR_WRONG_TYPE(as other RI handlers do) before returning.
libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs:1RI.DELhas the same WRONGTYPE handling issue asRI.GET: it returns without emitting an error response. Align with the other handlers by writingCmdStrings.RESP_ERR_WRONG_TYPEand returning, rather than callingSendAndReset()alone.
libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs:1RI.CREATEparses numeric options viaGetLong()and then casts toulong/uint. Negative inputs will underflow to very large values and bypass the== 0validation, potentially causing extreme allocations or undefined behavior in the native layer. Reject negative values explicitly (e.g., parse intolong, validate> 0and withinuint/ulongbounds) before casting.
libs/server/Resp/RangeIndex/RespServerSessionRangeIndex.cs:1RI.CREATEparses numeric options viaGetLong()and then casts toulong/uint. Negative inputs will underflow to very large values and bypass the== 0validation, potentially causing extreme allocations or undefined behavior in the native layer. Reject negative values explicitly (e.g., parse intolong, validate> 0and withinuint/ulongbounds) before casting.
libs/server/Resp/RangeIndex/RangeIndexManager.cs:1XxHash128.Hash(...)returns aHash128struct (System.IO.Hashing), butGuiddoes not have a constructor that acceptsHash128. This is likely a compile error. Convert the hash to bytes (e.g., viahash.ToByteArray()or writing into a 16-byte span) and pass the resulting 16 bytes toGuid.
libs/server/Resp/RangeIndex/RangeIndexManager.Index.cs:1- The comment says
TreeHandleis a “managed object handle”, but throughout the implementation it’s treated as a native pointer (BfTreeService.NativePtr/bftree_*APIs). Please update the comment to reflect that it’s a native pointer to the underlying bf-tree instance.
libs/server/Resp/RangeIndex/RangeIndexManager.cs:1 PurgeOldCheckpointSnapshotsdoes a recursiveEnumerateFiles(..., AllDirectories)over the entirerangeindexdirectory on every checkpoint completion. As the number of RI keys grows, this becomes an O(total files) operation per checkpoint. A more scalable approach is to track per-index directories (e.g., fromliveIndexes+ known key dirs) and only scan those, or store the last token per key dir to avoid global recursion.
libs/native/bftree-garnet/BfTreeService.cs:1- Using
[0]as the “minimum key” will skip an empty key ("") if bf-tree allows it (and the Garnet RI docs/examples indicate empty start keys are valid). If empty keys are allowed,ScanAllshould start from an empty span/array (length 0). If empty keys are not supported, the docs/validation should explicitly state that.
// Copyright (c) Microsoft Corporation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d840e41 to
dca2823
Compare
…E support (preview) Adds Range Index as a new Garnet data type backed by Bf-Tree, a high-performance B-tree for ordered key-value storage with range scan support. Gated behind --enable-range-index-preview. Not supported in cluster mode. Commands: RI.CREATE, RI.SET, RI.GET, RI.DEL, RI.SCAN, RI.RANGE, RI.EXISTS, RI.CONFIG, RI.METRICS. TYPE returns 'rangeindex'. DEL/UNLINK frees the tree. Lifecycle (Tsavorite IRecordTriggers): - OnFlush: snapshot BfTree to flush.bftree, set FlagFlushed for lazy promote - OnEvict: free BfTree under exclusive lock - OnDiskRead: zero stale TreeHandle - OnCheckpoint: barrier at VersionShift, snapshot at FlushBegin (SnapshotPending filtering skips v+1 trees), cleanup at CheckpointCompleted - OnRecovery/OnRecoverySnapshotRead: set recovered token and FlagRecovered - Lazy promote (RIPROMOTE): CopyUpdater copies stub to tail, PostCopyUpdater clears source handle after CAS success - Lazy restore: exclusive lock, re-read stub, restore from checkpoint or flush snapshot, copy to data.bftree working path Checkpoint consistency: - Only trees with SnapshotPending=1 are snapshotted; v+1 trees skipped - Snapshot failures are fatal (propagate to state machine driver) - MEMORY trees skipped (snapshot not supported by native library) - Old checkpoint snapshots purged at CheckpointCompleted (gated on removeOutdated) - RangeIndex paths namespaced by DB ID for multi-database isolation AOF logging: - RI.SET/RI.DEL: synthetic no-op RMW triggers AOF entry; replay via HandleRangeIndexSetReplay/DelReplay in AofProcessor - RI.CREATE: stub bytes logged; replay creates fresh BfTree, replaces stale handle, then RMW proceeds - AOF-only recovery (no checkpoint) supported WRONGTYPE safety: - ReadMethods/RMWMethods: bidirectional type checks (RI cmd on non-RI key and non-RI cmd on RI key) - UpsertMethods: InPlaceWriter blocks SET on RI/Vector stubs (UpsertAction.WrongType + WRONG_TYPE OperationStatus added to Tsavorite) - All 9 RI RESP handlers check and propagate WRONGTYPE Tsavorite infrastructure: - CheckpointTrigger.CheckpointCompleted enum value - UpsertAction.WrongType + OperationStatus.WRONG_TYPE - ClearBitsOnPage: OnDiskRead/OnRecoverySnapshotRead scoped to correct address ranges (no double-call on boundary page) 55 tests covering CRUD, scan, range, lifecycle, checkpoint recovery, AOF replay, AOF-only recovery, WRONGTYPE, concurrent stress (4 threads + blocking SAVE with strict prefix verification), and utility commands. Documentation: dedicated Range Index commands page, API compatibility listing, design doc updated with implementation status. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dca2823 to
8290dd2
Compare
| @@ -0,0 +1,2 @@ | |||
| [toolchain] | |||
| channel = "stable" | |||
There was a problem hiding this comment.
nit: thoughts on using a hardcoded version like diskann repo does?
https://github.com/microsoft/DiskANN/blob/main/rust-toolchain.toml
That way we can have reproducibility between builds/releases that don't change rust version
| - bash: | | ||
| curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable | ||
| source $HOME/.cargo/env | ||
| cargo build --release --manifest-path libs/native/bftree-garnet/Cargo.toml |
There was a problem hiding this comment.
nit: should we add --locked so Cargo.lock file is used and then we have more reproducible builds?
Description of Change
Adds Range Index as a new Garnet data type, backed by Bf-Tree — a high-performance B-tree for ordered key-value storage with range scan support. Gated behind
--enable-range-index-preview.Commands implemented (9):
RI.CREATE— Create index with DISK or MEMORY backendRI.SET/RI.GET/RI.DEL— Point operations on entriesRI.SCAN/RI.RANGE— Range queries (DISK backend only)RI.EXISTS/RI.CONFIG/RI.METRICS— Utility commandsTYPEreturns"rangeindex"for RI keysDEL/UNLINKfrees the underlying BfTreeLifecycle infrastructure (Tsavorite IRecordTriggers):
OnFlush— Snapshot BfTree to flush.bftree, set FlagFlushed for lazy promoteOnEvict— Free BfTree under exclusive lockOnDiskRead— Zero stale TreeHandleOnCheckpoint— VersionShift (barrier), FlushBegin (snapshot), CheckpointCompleted (cleanup)OnRecovery/OnRecoverySnapshotRead— Set recovered token / FlagRecoveredCheckpoint consistency:
SnapshotPending=1at barrier time are snapshotted; v+1 trees are skippedAOF logging:
WRONGTYPE safety:
55 tests covering CRUD, scan, range, lifecycle (flush/evict/promote/restore), checkpoint recovery, AOF replay, AOF-only recovery, WRONGTYPE,
concurrent stress (4 threads + blocking SAVE with strict prefix verification), and utility commands.
Documentation:
website/docs/commands/range-index.md)Not in this PR (deferred)