coldkeep

Correctness-first cold storage engine

• Content-addressed • Built-in deduplication • Deterministic restore

• Verifiable integrity • Crash-safe • GC-safe

Branding

Coldkeep uses a visual identity based on an ice cube vault:

🧊 cold storage (ice cube)
🔒 secure data (vault door)
🗄️ structured containers (internal shelves)

Project Status

Status: v1.9 formalizes transform-based storage semantics (logical/compressed/physical layers) with block-level compression and explicit staged verification, while preserving deterministic restore, GC safety, snapshot semantics, and mixed-repository compatibility. Migration note (v1.9): existing v1.7/v1.8 payloads remain readable through compatibility paths with no forced rewrite or recompression. Missing PostgreSQL schema requires manual schema application or COLDKEEP_DB_AUTO_BOOTSTRAP=true. Existing older schemas are auto-upgraded to the required v15 schema at startup.

Current Release Focus

Coldkeep is entering the v1.10 reliability freeze.

The v1.10.x train focuses on correctness burn-down, CI hardening, Codacy/audit triage, restore/recovery safety, GC correctness, packed-storage consistency, validation, and release-gate discipline before engine-boundary work begins in v1.11.

v1.10 is not a feature-expansion train. See docs/release/v1.10/README.md.

coldkeep is a local-first content-addressed storage engine focused on deterministic restore, explicit integrity verification, and safe lifecycle behavior under failure scenarios.

Now with snapshot lineage, diff summaries, and safe deletion insights.

Why coldkeep?

coldkeep is designed for correctness-first cold storage.

Unlike traditional backup tools, it emphasizes:

deterministic, byte-identical restore
content-addressed deduplication
explicit, test-backed integrity checks
safe recovery and reference-safe garbage collection
machine-readable CLI behavior suitable for automation

The goal is confidence and recoverability over maximum throughput.

v1.7 performance work followed the existing execution model: bounded worker-based commands under explicit safety constraints, without turning coldkeep into a fully concurrent daemon or changing on-disk format, chunk layout, or operator-visible schema compatibility. v1.8 introduced packed multi-chunk storage blocks and completed AES-GCM packed-block integration. v1.9 builds on that foundation with formal transform-aware storage semantics, block-level compression, and explicit verification stages while preserving restore determinism, snapshot semantics, and GC safety.

Features

Snapshot lineage (--from)
Snapshot diff summaries
Snapshot tree visualization
Safe deletion preview (--dry-run)
Read-only observability (stats, inspect)
Exact GC simulation with trace support
Built-in deduplication
Deterministic restore

Status

Coldkeep has ten explicit correctness layers:

v1.0: storage correctness (restore determinism, integrity, recovery, GC safety)
v1.1: interaction correctness (CLI orchestration, machine-readable contracts, batch semantics)
v1.2: physical-file graph coherence, explicit repair semantics, audited GC refusal, and invariant-aware batch maintenance reporting
v1.3: snapshot-based retention as a correctness layer (immutable point-in-time captures, snapshot-protected GC, reachability audits)
v1.4: snapshot clarity and lifecycle hardening (explicit lineage semantics, safer dry-run wording, stricter pre-release verification guidance)
v1.5: chunker-evolution compatibility contract clarity (mixed-version repositories, explicit new-writes-only chunker policy)
v1.6: observability and simulation contract hardening (read-only introspection, exact GC simulation parity, trace channel behavior)
v1.7: controlled-execution performance validation (benchmarking, deterministic comparison, and release-readiness safety proof without storage-format or schema-breaking change)
v1.8: packed block abstraction and AES-GCM packed-block integration (multi-chunk storage blocks, dual-compat read path, locked block-size defaults, configurable operator override, release hardening)
v1.9: transform-based storage architecture freeze (block-level compression, logical/compressed/physical hash semantics, metadata-driven read path, and explicit staged verification)

Guarantees are enforced through automated validation and CI gates; see VALIDATION_MATRIX.md for guarantee-to-evidence mapping.

If you are new to the project, start here, then continue to ARCHITECTURE.md for the internal model and VALIDATION_MATRIX.md for the guarantee-to-evidence map.

v1.9 Storage Contract

v1.9 keeps packed storage blocks as the default write path for new data.
The default packed block size is 1 MiB.
COLDKEEP_BLOCK_TARGET_SIZE_MB exists as an advanced operator tuning override for new writes only. Valid values for v1.9: 1, 2, 3 (MiB). Other values log a warning and use the locked default. This override is retained for benchmarking and specialized operator tuning; production deployments should use the default.
COLDKEEP_PACKED_BLOCK_SIZE_MIB is a legacy fallback environment variable checked only if COLDKEEP_BLOCK_TARGET_SIZE_MB is not set. It is accepted for backward compatibility; new configurations should use COLDKEEP_BLOCK_TARGET_SIZE_MB.
v1.9 reads existing v1.7/v1.8 repositories without rewriting historical data.
v1.9 writes packed blocks for new data through storage_blocks and chunk_block_refs.
Mixed repositories containing legacy v1.7/v1.8 data and new v1.9 compressed/encrypted blocks are valid steady-state.
v1.7 is not guaranteed to read repositories that contain v1.8/v1.9 packed-block data.
Both plain and aes-gcm codec settings work end-to-end with packed writes. When COLDKEEP_CODEC=aes-gcm, the full encoded block is AES-GCM encrypted and storage_blocks.codec is set to "aes-gcm"; stored bytes are a 12-byte nonce prefix followed by the ciphertext. When COLDKEEP_CODEC=plain, storage_blocks.codec is "none" and stored bytes are the plaintext encoded block. The read path (StorageBlockReader) handles both layouts transparently using per-block metadata.
Compression settings (none / zstd) affect future writes only and never rewrite historical blocks.

Compression and Integrity Contract (Pre-v1.10 Freeze)

Compression behavior:

Compression is block-level.
Compression happens before encryption.
Compression configuration affects only newly written blocks.
Existing blocks are never recompressed automatically.
Reads and verify use per-block metadata, so mixed repositories (legacy + new transform metadata) are valid steady-state.
Compression is store-if-smaller: some zstd-configured blocks are intentionally stored uncompressed when compression would expand payload size.
Compression does not change dedup identity; dedup remains anchored to logical block content.

Integrity checkpoints:

logical_hash (block_hash) verifies decoded logical block content.
payload_hash is a deprecated lowercase-hex mirror of block_hash retained for compatibility/observability only.
compressed_hash verifies pre-encryption compressed payload.
physical_hash verifies exact persisted bytes in container storage.

Core Guarantees

Summary

deterministic, byte-identical restore
no exposure of partially written or inconsistent data
GC is reference-safe: no reachable chunk is ever deleted
Atomic restore replacement (within single-node local filesystem semantics)
Safe in-process concurrent storage operations

Core invariants

Guarantee IDs are stable and tracked in VALIDATION_MATRIX.md:

G1: deterministic, byte-identical restore
G2: repeat store does not drift chunk graph
G3: no exposure of partially written or inconsistent data
G4: GC is reference-safe (no reachable chunk is deleted)
G5: atomic restore replacement (single-node local filesystem semantics)
G6: safe in-process concurrent storage operations
G7: deep corruption detection (payload/offset/tail)
G8: corrective health gate contract stability
G9: deterministic batch CLI orchestration and automation-safe contract behavior
G10: current-state physical mapping graph coherence is audited in standard verify
G11: GC executes only on an audited coherent physical-root graph
G12: invariant failures expose stable machine-readable classification and operator guidance
G13: batch maintenance commands expose deterministic execution semantics and invariant-aware per-item reporting
G14: snapshot-retained content is GC-safe and protected by liveness union (current + snapshot roots)
G15: snapshot deletion only changes metadata and future GC eligibility (content preserved)
G16: stats expose snapshot-retention pressure to operators (retained-only-by-current, retained-only-by-snapshot, shared)
G17: verify and doctor audit persisted snapshot reachability integrity and report retention context

Definitions and evidence mapping for G1-G17 are tracked in VALIDATION_MATRIX.md.

Documentation is split into:

README.md for overview, quickstart, and CLI usage
ARCHITECTURE.md for the internal model, invariants, lifecycle, and trust boundary
COMPATIBILITY.md for version-compatibility, chunker-evolution contract, and explicit non-guarantees
VALIDATION_MATRIX.md for guarantee-to-evidence mapping
CONTRIBUTING.md for contributor workflow, local CI guidance, and stats benchmark commands for observability-sensitive changes
PRE_RELEASE_CHECKLIST.md for release-gate execution
SECURITY.md for the threat model and security limits
docs/internal/storage_compatibility_matrix.md for the formal storage compatibility matrix and benchmark scope split
docs/PATH_IDENTITY.md for current-state path identity policy
CHANGELOG.md for milestone history

For the deeper model (invariants, lifecycle, validity, recovery, trust boundary), see ARCHITECTURE.md.

Chunking at a Glance

coldkeep uses content-defined chunking (CDC).

chunk boundaries depend on data patterns (not fixed-size windows),
different chunker versions can choose different boundary strategies,
stored state is a chunked reconstruction recipe (file_chunk -> chunk -> blocks), not a raw whole-file blob.

Example:

File A (v1):
  [chunk1][chunk2][chunk3]

File B (v2):
  [chunk4][chunk5]

Even with overlapping content, layout can differ across chunker versions.

Chunker Versions

each committed logical file stores chunker_version metadata,
one repository can contain multiple chunker versions,
chunker version is selected at store time,
fresh v1.5+ repositories default new writes to v2-fastcdc,
upgraded repositories preserve prior write default (v1-simple-rolling unless explicitly changed),
chunks may be reused across chunker versions if their content is identical,
cross-version reuse is opportunistic and not guaranteed for efficiency ratios,
chunker_version on chunk rows is origin metadata, not a reuse constraint,
restore is recipe-driven and does not depend on the active write chunker.

Configure repository write default:

coldkeep config set default-chunker <version>

This affects new writes only and does not rewrite existing data.

Safety Guarantees (High-Level)

restore correctness: stored files restore byte-identically,
snapshot stability: snapshots remain valid across upgrades,
non-destructive evolution: no automatic background re-chunking or silent rewrite,
forward-compatible metadata: unknown but well-formed future chunker labels do not block restore.

For full guarantees, non-guarantees, and upgrade behavior details:

Legacy compatibility contract (v1.9):

mandatory: old repositories remain readable/restorable
not guaranteed: automatic rewrite, recompression, or eager migration of historical data

When to use coldkeep

Good fit:

cold/backup storage where correctness matters more than speed
environments needing explicit integrity verification
deduplication + deterministic restore use cases

Not a fit (v1.x scope):

hot-path high-throughput storage
distributed/multi-node coordination

Quickstart

A small samples directory is included for local testing.

If you only want the fastest successful first run, use the Local (no Docker) path below, then come back to the later sections as needed.

Local (no Docker)

# 1) Initialize key material (.env)
coldkeep init

# 2) Load environment
export $(cat .env | xargs)

# 3) Configure local PostgreSQL connection (required for local mode)
export DB_HOST=127.0.0.1
export DB_PORT=5432
export DB_USER=coldkeep
export DB_PASSWORD=coldkeep
export DB_NAME=coldkeep
export DB_SSLMODE=disable
export COLDKEEP_DB_AUTO_BOOTSTRAP=true

# 4) Store and inspect
coldkeep store samples/hello.txt
coldkeep stats

# 5) Restore + verify
# restore expects file ID(s), not source filename
coldkeep restore 1 ./restored
coldkeep verify system --standard

Security note: if the encryption key is lost, encrypted data cannot be recovered.

Command form tips:

restore expects logical file IDs (coldkeep restore <fileID> <outputDir>); use --stored-path if you want path-based restore.
verify expects a target: coldkeep verify system ... or coldkeep verify file <fileID> ....

Docker

# 1) Start services
docker compose up -d --build

# 2) Initialize key material on host-mounted workspace
docker compose run --rm -v "$PWD:/app" coldkeep init

# 3) Store a sample file
docker compose run --rm \
  --env-file .env \
  -v "$PWD/samples:/samples" \
  coldkeep store /samples/hello.txt

Smoke Validation (Two Approaches)

If you are preparing a PR, run the smoke gate (scripts/smoke.sh) with either workflow below. Both are valid and both are used by contributors.

PR author tip: use the PR template at .github/pull_request_template.md to summarize invariants and lifecycle-semantics impact for reviewers. For a contributor-oriented local CI path before that, see CONTRIBUTING.md. If your change touches coldkeep stats or stats query shape, the same guide also includes a short stats benchmarking section with small/medium/large benchmark commands.

Approach A: Docker runner

Use the coldkeep service container to run the smoke script.

# 1) Ensure PostgreSQL service is up
docker compose up -d coldkeep_postgres

# 2) Load encryption env from .env generated by coldkeep init
set -a
source .env
set +a

# 3) Run smoke inside the coldkeep container
docker compose run --rm \
  -e COLDKEEP_KEY="$COLDKEEP_KEY" \
  -e COLDKEEP_CODEC="$COLDKEEP_CODEC" \
  -v "$PWD/samples:/samples:ro" \
  --entrypoint sh coldkeep \
  -lc 'apk add --no-cache jq >/dev/null && COLDKEEP_SAMPLES_DIR=/samples scripts/smoke.sh'

Approach B: Host runner

Run the smoke script on host with a local binary, pointing to Docker PostgreSQL.

# 1) Ensure PostgreSQL service is up
docker compose up -d coldkeep_postgres

# 2) Build coldkeep locally and load encryption env
go build -o coldkeep ./cmd/coldkeep
set -a
source .env
set +a

# 3) Run smoke from host against Docker PostgreSQL
DB_HOST=127.0.0.1 \
DB_PORT=5432 \
DB_USER=coldkeep \
DB_PASSWORD=coldkeep \
DB_NAME=coldkeep \
DB_SSLMODE=disable \
PATH="$PWD:$PATH" \
./scripts/smoke.sh

# 4) Optional cleanup of local binary
rm -f coldkeep

Notes:

scripts/smoke.sh requires jq and coldkeep on PATH in the execution environment.
Containerized simulate checks may print a non-fatal warning about sqlite/cgo stubs; smoke continues unless COLDKEEP_SMOKE_STRICT_SIMULATE=1 is set.

CLI Basics

Typical flows:

coldkeep store file.txt
coldkeep store-folder ./data
coldkeep restore 12 ./out
coldkeep restore --stored-path docs/report.txt --destination ./out/report.txt --mode override
coldkeep remove 12
coldkeep gc
coldkeep stats
coldkeep list
coldkeep search report
coldkeep verify system --standard
coldkeep doctor

Simulation (no physical writes):

coldkeep simulate store-folder ./data
coldkeep simulate store file.txt --output json

Observability and GC simulation (read-only):

coldkeep stats
coldkeep stats --json

coldkeep inspect <entity> <id>
coldkeep inspect <entity> <id> --relations
coldkeep inspect <entity> <id> --reverse
coldkeep inspect <entity> <id> --deep --limit N

coldkeep simulate gc
coldkeep simulate gc --delete-snapshot <id>
coldkeep simulate gc --containers

# trace diagnostics are emitted on stderr
coldkeep stats --trace
coldkeep inspect chunk <id> --trace-json
coldkeep simulate gc --trace-json

Supported inspect entities currently include: file (alias: logical-file), chunk, container, and snapshot.

Observability command guarantees (v1.6):

stats, inspect, and simulate gc are read-only command surfaces.
simulate gc is an exact simulation of GC reclaimability under the same integrity gates.
simulate gc previews exact GC reclaimability using the shared GC planning layer (gc.BuildPlan), including fully-dead active containers; it is not legacy gc --dry-run behavior.
GC simulation does not mutate repository state (no database writes and no filesystem writes).
JSON output is intended for tooling/automation contracts.
meta.version is the CLI JSON contract version. It remains v1.7 for additive, backward-compatible fields (including v1.8/v1.9 stats.block_layout additions) and only bumps on breaking JSON contract changes.
Deep inspect output can be large; use --limit N to bound traversal output for operators and CI.
--trace and --trace-json are diagnostics channels; traces are emitted to stderr so stdout data remains stable for piping.
v1.8/v1.9 stats includes block-layout observability for packed storage: storage_blocks_count, chunk_block_refs_count, avg_chunks_per_block, avg_block_plaintext_size, avg_block_stored_size, avg_block_fill_ratio, legacy_block_count, packed_block_count, and codec_distribution when packed blocks are present.

Operator-facing v1.9 delta for common commands:

coldkeep store, restore, verify system --standard, gc --dry-run, gc, stats --json, and inspect keep their existing invocation shape; v1.9 does not add new required flags to these commands.
stats may include packed-block metrics in human and JSON output.
verify may surface packed-block integrity categories such as packed block hash or metadata corruption.
Block abstraction is documented, but remains a compatibility-layer change rather than a new mandatory operator workflow.

Chunker benchmark and interpretation:

coldkeep benchmark chunkers --output json
coldkeep benchmark run --dataset small --repeat 1 --output json
scripts/run_phase8_blocksize_matrix.sh --list-missing

v1.9 supports both CLI and scripted benchmark workflows.

Use coldkeep benchmark chunkers and coldkeep benchmark run for operator-facing repeatable local measurements.
Use scripts/run_phase8_*.sh and scripts/compare_phase8_*.py for release matrix orchestration and historical comparison workflows.

Typical outcomes to expect (informational ranges):

Small modifications: v1: ~92-96% reuse v2: ~94-98% reuse
Shifted data: v1: ~5-20% reuse v2: ~25-50% reuse

Interpretation note: the shifted-data reuse gap is the main justification signal for v2 FastCDC boundary stability improvements. Critical insight: this indicates FastCDC improves not only dedup ratio, but dedup stability over time under boundary-shifting changes.

Common mistakes to avoid:

Do not assert exact chunk counts; implementations can vary slightly while preserving correctness.
Do not use non-deterministic input data; keep all generated data seed-driven for CI reliability.
Do not ignore shifted-data comparisons; this is the most important stability signal.
Do not overcomplicate metrics; keep interpretation focused on reuse percentage, chunk count, and coverage invariants.

Batch Operations (v1.2)

Batch restore/remove/repair extends the automation contract with deterministic orchestration and invariant-aware reporting.

coldkeep restore 12 18 24 ./out
coldkeep remove 12 18 24
coldkeep remove --input ids.txt
coldkeep remove --stored-paths /data/a.txt /data/b.txt --input paths.txt
coldkeep repair ref-counts --batch
coldkeep repair --batch --input repair_targets.txt
coldkeep restore 12 18 ./out --dry-run

Current repair --batch scope is target-oriented, not item-oriented:

today the only supported target is ref-counts
input files for repair --batch --input <file> currently contain repeated target names such as ref-counts
they do not contain file IDs or stored paths

Semantics (summary):

per-item isolation by default
optional fail-fast for execution failures
duplicate target skipping
deterministic per-item report ordering
JSON status values are intentionally two-layered:
- overall payload status: ok, partial_failure, error
- per-item result status: success, failed, skipped, planned
JSON execution mode is explicit: continue_on_error (default) or fail_fast
process exit is automation-friendly:
- 0 when no item fails
- 1 when one or more items fail
- 2 for pre-execution validation/usage failures (including empty effective target sets after parsing input)

Example JSON payload:

{
  "status": "partial_failure",
  "operation": "repair",
  "dry_run": false,
  "execution_mode": "continue_on_error",
  "summary": {
    "total": 2,
    "succeeded": 1,
    "failed": 1,
    "skipped": 0
  },
  "results": [
    {
      "id": "ref-counts",
      "status": "success",
      "message": "logical_file ref_count values repaired"
    },
    {
      "id": "ref-counts",
      "status": "failed",
      "message": "repair refused: orphan physical_file rows detected",
      "invariant_code": "REPAIR_REFUSED_ORPHAN_ROWS",
      "recommended_action": "Remove or correct orphan physical_file rows before retrying repair."
    }
  ]
}

For full batch contract details and examples, see ARCHITECTURE.md and PRE_RELEASE_CHECKLIST.md.

Snapshot Layer (v1.4)

coldkeep snapshots capture an immutable, point-in-time view of your stored files.

Snapshots capture a complete, immutable view of the current system state. Even when using --from, snapshots are always fully self-contained and do not depend on their parent.

Critical clarity:

Snapshots are always self-contained.
--from records lineage metadata for analysis only.
--from does not create dependencies.
A child snapshot restore never requires reading parent snapshot content.

Creating snapshots

v1.4 flow example:

# Create initial snapshot
coldkeep snapshot create --id day1

# Modify files...

# Create snapshot with lineage
coldkeep snapshot create --id day2 --from day1

# Understand changes
coldkeep snapshot diff day1 day2 --summary

# Inspect snapshot reuse
coldkeep snapshot stats day2

# Visualize history
coldkeep snapshot list --tree

# Preview deletion
coldkeep snapshot delete day1 --dry-run

# Full snapshot (all physical_file entries)
coldkeep snapshot create

# Full snapshot with lineage metadata
coldkeep snapshot create --id day2 --from day1

# Partial snapshot (exact paths and/or directory prefixes)
coldkeep snapshot create docs/ report.txt --label release-2026-04

--id <snapshotID>: snapshot_id system identifier. This is the command target for show, restore, stats, diff, and delete.
--label <string>: optional user-facing metadata only. It is not an identifier and is never used for command targeting.
--from <snapshotID>: optional parent snapshot lineage metadata on create. This is informational only and does not create any parent-content dependency during create or restore.

--from <snapshotID> behavior:

snapshot recorded as derived from parent
does not create a dependency
snapshot content is still built from current system state
parent relationship is used for comparison and visualization only

Current lineage scope policy:

--from is currently supported only for full snapshots.
Parent snapshot referenced by --from must also be full.
Filtered parent/child lineage for partial snapshots is intentionally rejected in this phase.

Snapshot command targeting contract:

There is no --snapshot selector flag for snapshot subcommands.
Pass snapshot_id positionally (for example: coldkeep snapshot restore <snapshotID>).

Listing and inspecting

coldkeep snapshot list
coldkeep snapshot list --type full --limit 10 --since 2026-01-01
coldkeep snapshot list --tree
coldkeep snapshot show snap-abc123
coldkeep snapshot show snap-abc123 --limit 50
coldkeep snapshot show snap-abc123 --prefix docs/
coldkeep snapshot show snap-abc123 --pattern "docs/*.txt" --min-size 1024
coldkeep snapshot stats
coldkeep snapshot stats snap-abc123

snapshot list --tree renders a lineage view from snapshot metadata (id, parent_id, created_at). If a parent snapshot was deleted, affected snapshots are still shown as roots; snapshot usability is unchanged. Lineage visualization is not a dependency graph for restore execution. The snapshot tree represents lineage metadata, not dependency.

Conceptual lineage example:

day1
 └── day2
  └── day3

Each snapshot is independent despite this structure.

snapshot list --tree:

displays snapshots as a lineage tree based on parent relationships
reflects metadata lineage only (not restore dependency)

snapshot stats lineage context:

when a parent snapshot is available, stats include reused files, new files, and reuse ratio
if the parent snapshot is missing, stats fall back gracefully with explanatory output

Snapshot file queries are reusable across snapshot show, snapshot restore, and snapshot diff.

Supported query flags:

--path <exact>: exact normalized snapshot path match; repeatable
--prefix <dir/>: normalized directory prefix match; repeatable and must end with /
--pattern <glob>: slash-path glob (path.Match) against the normalized snapshot path
--regex <re>: regular expression against the snapshot path
--min-size <bytes> / --max-size <bytes>: inclusive logical size range
--modified-after <RFC3339|YYYY-MM-DD> / --modified-before <RFC3339|YYYY-MM-DD>: inclusive mtime window

All active criteria are ANDed together. Path and prefix inputs are normalized before matching, and result ordering remains deterministic.

Restoring from a snapshot

# Restore all files to their original paths
coldkeep snapshot restore snap-abc123

# Restore a subdirectory under a new prefix
coldkeep snapshot restore snap-abc123 docs/ --mode prefix --destination ./restored

# Restore a single file to an explicit destination
coldkeep snapshot restore snap-abc123 docs/report.txt --mode override --destination ./out/report.txt

# Restore only matching files from the snapshot query layer
coldkeep snapshot restore snap-abc123 --prefix docs/ --pattern "docs/*.txt" --mode prefix --destination ./restored

Diffing two snapshots

snapshot diff compares two snapshots by path and logical file identity, classifying each change as added, removed, or modified. When query filters include size or mtime constraints, diff evaluates added and modified entries against target-snapshot metadata, and removed entries against base-snapshot metadata. A file is considered modified if its content changes, even when the path stays the same.

# Show all changes between two snapshots
coldkeep snapshot diff snap-1 snap-2

# Show only added files
coldkeep snapshot diff snap-1 snap-2 --filter added

# Restrict the diff view to a path subset
coldkeep snapshot diff snap-1 snap-2 --prefix docs/

# Return summary counts only (no per-entry list)
coldkeep snapshot diff snap-1 snap-2 --summary

# Combine diff classification with snapshot query filters
coldkeep snapshot diff snap-1 snap-2 --filter modified --regex "\\.yaml$"

# Machine-readable JSON output
coldkeep snapshot diff snap-1 snap-2 --output json

Text output example:

[SNAPSHOT DIFF]

Base:    snap-1
Target:  snap-2

+ docs/new.txt
- docs/old.txt
~ docs/config.yaml

Summary:
  added: 1
  removed: 1
  modified: 1

JSON output schema:

{
  "status": "ok",
  "command": "snapshot diff",
  "data": {
    "base": "snap-1",
    "target": "snap-2",
    "summary": { "added": 1, "removed": 1, "modified": 1 },
    "entries": [
      { "path": "docs/new.txt",    "type": "added",    "base_logical_id": null, "target_logical_id": 2 },
      { "path": "docs/old.txt",    "type": "removed",  "base_logical_id": 1,    "target_logical_id": null },
      { "path": "docs/config.yaml","type": "modified", "base_logical_id": 3,    "target_logical_id": 4 }
    ],
    "duration_ms": 12
  }
}

--filter limits output to one change type (added, removed, or modified). Summary counts reflect the filtered set. --summary returns counts only and skips detailed entries output.

snapshot diff --summary:

displays a summary of changes
includes added, removed, and modified counts

The JSON contract for snapshot commands is unchanged. Query flags only reduce the returned files or entries collections and the derived counts; field names and envelope structure remain stable.

Deleting a snapshot

coldkeep snapshot delete snap-abc123 --force
coldkeep snapshot delete snap-abc123 --dry-run

Deletes only the snapshot row and its snapshot_file entries. The underlying logical files and blocks are not affected. Deleting a snapshot removes metadata only. Data remains retained when still referenced by other snapshots or current state.

--dry-run is read-only and reports impact details (lineage preview and file-count breakdown) without applying changes. Dry-run impact describes metadata/reference effects and does not guarantee disk-space reclamation. When both --force and --dry-run are passed, --dry-run takes precedence and the command remains read-only.

snapshot delete --dry-run preview includes:

number of files referenced by the snapshot
files unique to this snapshot
files shared with other snapshots
lineage impact

No data is modified in dry-run mode.

Safe lineage workflow (v1.4)

Use this sequence when operating on parent/child snapshots:

# 1) Create baseline and child lineage metadata
coldkeep snapshot create --id day1
coldkeep snapshot create --id day2 --from day1

# 2) Review lineage and impact before delete
coldkeep snapshot list --tree
coldkeep snapshot delete day1 --dry-run

# 3) If approved, delete parent metadata
coldkeep snapshot delete day1 --force

# 4) Verify child remains independently restorable
coldkeep snapshot restore day2

Expected behavior:

Deleting day1 changes lineage metadata and future GC eligibility only.
day2 remains restorable because snapshots are self-contained.
snapshot list --tree may re-root children after parent delete; restore behavior is unchanged.

Snapshot release gate (operator quick checklist)

Before tagging a release, run the dedicated snapshot/retention contract gate in PRE_RELEASE_CHECKLIST.md.

For the focused automated snapshot gate, run:

scripts/run_snapshot_release_gate.sh --count 1

Run the checklist step-by-step and in order. For the manual snapshot lifecycle gate, use a stable snapshot identifier (for example via snapshot create --id pre-gc-gate) and pass snapshot IDs positionally in snapshot restore, snapshot diff, and snapshot delete.

Manual lifecycle expected in the release gate:

create snapshot
remove current mapping
confirm GC dry-run reports snapshot-retained logical files
restore from snapshot
diff two snapshots
delete snapshot
confirm GC eligibility changes only after delete

For the full release criteria, use the snapshot sign-off sections in PRE_RELEASE_CHECKLIST.md:

15) Snapshot sign-off checklist (Phases 1-7)
C. Test surface checklist
D. Documentation / release checklist
15) Verify snapshot / retention contract (manual gate)
16) Final global sign-off

When opening the release PR, use .github/pull_request_template.md to keep impact and validation context explicit.

Future Hardening Backlog (non-blocking)

Add fuzz coverage for snapshot query combinations (--regex, --pattern, --prefix) to further harden parser+matcher edge cases.
This is a future hardening task and is not part of the current release gate.

Doctor (recommended health gate)

coldkeep doctor is the operator health gate:

runs recovery first (corrective)
then schema/version sanity checks
then verification (standard by default; full/deep optional)

Doctor is intentionally corrective, not read-only.

coldkeep doctor
coldkeep doctor --full
coldkeep doctor --deep --output json

Verification

Verification levels:

standard: metadata integrity
full: structural/container integrity
deep: full content read + hash validation

coldkeep verify system --standard
coldkeep verify system --full
coldkeep verify system --deep

Verification checks are observational. In CLI flows, startup recovery may run before verification.

Documentation Map

Architecture and internals: ARCHITECTURE.md
Guarantee mapping and evidence: VALIDATION_MATRIX.md
Contribution workflow: CONTRIBUTING.md
Release readiness flow: PRE_RELEASE_CHECKLIST.md
Security reporting and threat guidance: SECURITY.md
Current-state path identity policy: docs/PATH_IDENTITY.md
Benchmark infrastructure and baseline policy: docs/benchmarking.md
Frozen v1.9 benchmark baseline contract: docs/internal/benchmark_baselines_v1_9.md
Milestone history: CHANGELOG.md

Roadmap note (post-v1.9)

Current status:

v1.2 physical mapping/repair and audited GC root gates are complete.
v1.3/v1.4 snapshot-retention correctness and lifecycle clarity are complete.
v1.5 chunker-evolution compatibility contract is complete.
v1.6 read-only observability and exact GC simulation tooling are complete.
v1.7 controlled-execution performance validation and release-readiness hardening are complete.
v1.8 packed block abstraction, AES-GCM packed-block integration, and release hardening are complete.
v1.9 transform-based storage semantics, block-level compression, and staged verification are complete.

Next focus is v1.10: architecture extraction on top of frozen v1.9 storage semantics.

Contributing

Contributions and discussions are welcome. See CONTRIBUTING.md.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1,892 Commits
.github		.github
assets/logo		assets/logo
benchmarks		benchmarks
cmd/coldkeep		cmd/coldkeep
db		db
docs		docs
internal		internal
samples		samples
samples_edge_cases		samples_edge_cases
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
ARCHITECTURE.md		ARCHITECTURE.md
BENCHMARK_PHASE4_STEP9.md		BENCHMARK_PHASE4_STEP9.md
BENCHMARK_PHASE8_BLOCK_SIZE_DECISION.md		BENCHMARK_PHASE8_BLOCK_SIZE_DECISION.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMPATIBILITY.md		COMPATIBILITY.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PRE_RELEASE_CHECKLIST.md		PRE_RELEASE_CHECKLIST.md
README.md		README.md
RELEASE_NOTES_v1.7.0.md		RELEASE_NOTES_v1.7.0.md
RELEASE_NOTES_v1.8.0.md		RELEASE_NOTES_v1.8.0.md
RELEASE_NOTES_v1.9.0.md		RELEASE_NOTES_v1.9.0.md
SECURITY.md		SECURITY.md
VALIDATION_MATRIX.md		VALIDATION_MATRIX.md
benchmark-baseline-committed.json		benchmark-baseline-committed.json
benchmark-baseline-w4.json		benchmark-baseline-w4.json
benchmark-baseline.json		benchmark-baseline.json
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
pre_release.md		pre_release.md

Folders and files

Latest commit

History

Repository files navigation

coldkeep

Branding

Project Status

Current Release Focus

Why coldkeep?

Features

Status

v1.9 Storage Contract

Compression and Integrity Contract (Pre-v1.10 Freeze)

Core Guarantees

Summary

Core invariants

Chunking at a Glance

Chunker Versions

Safety Guarantees (High-Level)

When to use coldkeep

Quickstart

Local (no Docker)

Docker

Smoke Validation (Two Approaches)

Approach A: Docker runner

Approach B: Host runner

CLI Basics

Batch Operations (v1.2)

Snapshot Layer (v1.4)

Creating snapshots

Listing and inspecting

Restoring from a snapshot

Diffing two snapshots

Deleting a snapshot

Safe lineage workflow (v1.4)

Snapshot release gate (operator quick checklist)

Future Hardening Backlog (non-blocking)

Doctor (recommended health gate)

Verification

Documentation Map

Roadmap note (post-v1.9)

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 29

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages