vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!
-
Updated
May 11, 2026 - Python
vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!
Mini LLM Serve is a Go-based LLM serving control plane for token-aware scheduling, streaming, TTFT/TBT metrics, and prefix cache metadata.
Correctness-fixed Rust/PyO3 flat-array DFA prefix cache — rewrite of BCR-memory v1 with regression tests for four bugs and an SGLang/vLLM head-to-head harness.
Add a description, image, and links to the prefix-cache topic page so that developers can more easily learn about it.
To associate your repository with the prefix-cache topic, visit your repo's landing page and select "manage topics."