prefix-cache

Here are 3 public repositories matching this topic...

jjang-ai / vmlx

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

macbook persistent-memory mlx openai-api llm lmstudio anthropic-api mcp-server kvcache-optimization kvcache-compression openclaw kvcache-reuse openclaw-agent prefix-cache mlxllm mlxstudio vmlx omlx omlx-alternative

Updated May 11, 2026
Python

qujing226 / mini-llm-serve

Star

Mini LLM Serve is a Go-based LLM serving control plane for token-aware scheduling, streaming, TTFT/TBT metrics, and prefix cache metadata.

golang distributed-systems benchmark streaming scheduler inference mlsys ai-infra dynamic-batching llm-serving llm-inference connectrpc prefix-cache prefill-decode

Updated May 9, 2026
Go

armanas / BCR-memory-2

Star

Correctness-fixed Rust/PyO3 flat-array DFA prefix cache — rewrite of BCR-memory v1 with regression tests for four bugs and an SGLang/vLLM head-to-head harness.

rust prefix-trie pyo3 kv-cache llm vllm sglang prefix-cache

Updated Apr 17, 2026
Python

Improve this page

Add a description, image, and links to the prefix-cache topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the prefix-cache topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly