_ _ _ _ _
/\ \ /\_\ /\ \ _ / /\ / /\
/ \ \ / / / _\ \ \ /_/ / / / / \
/ /\ \ \ / / / /\_\ \ \ \___\/ / / /\ \__
/ / /\ \_\/ / /__/ / / / / \ \ \ / / /\ \___\
/ /_/_ \/_/ /\_____/ /\ \ \ \_\ \ \ \ \ \/___/
/ /____/\ / /\_______/ \ \ \ / / / \ \ \
/ /\____\// / /\ \ \ \ \ \/ / / \ \ \
/ / / / / / \ \ \ \ \ \/ /_/\__/ / /
/ / / / / / \ \ \ \ \ /\ \/___/ /
\/_/ \/_/ \_\_\ \_\/ \_____\/
⚡ FKVS (Fast Key Value Store) is a tiny, high‑performance key‑value store written in C with a single‑threaded, non‑blocking I/O multiplexed event loop.
It is part of my experiment to understand what it takes to build a key value store similar to redis in C from first principles and to eventually get it into a production ready state.
docker build -t fkvs:latest -f Dockerfile .
docker run --rm -it --name fkvs -p 5995:5995 fkvs:latest # if you intend to connecting from host via tcp
docker run --rm -it --name fkvs fkvs:latest
## Connect to server using 127.0.0.1 and port 5995
docker exec -it fkvs /usr/local/bin/fkvs-cli -h 127.0.0.1 -p 5995 -d /home/fkvs/client.conf
## Additional commands for running benchmarks from within the container
docker exec -it fkvs /usr/local/bin/fkvs-benchmark -n 1000000 -t set -c 30 -u # run benchmark using unix domain sockets
docker exec -it fkvs /usr/local/bin/fkvs-benchmark -n 1000000 -t set -c 30 # run benchmark using tcpsudo apt remove --purge --auto-remove cmake
sudo apt update
sudo apt install -y software-properties-common lsb-release && \
sudo apt clean all
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | sudo tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
sudo apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main"
sudo apt update
sudo apt install kitware-archive-keyring
rm /etc/apt/trusted.gpg.d/kitware.gpg
sudo apt update
sudo apt install gcc
sudo apt install cmake
make -f Makefile.fkvs setup-and-build
./fkvs-server -cbrew install cmake
make -f Makefile.fkvs setup-and-build
./fkvs-server -cThe server uses jemalloc by default on Linux when the library is installed, falling back to the system allocator otherwise. It benefits allocation-heavy workloads (many distinct keys / high write churn).
# Linux: install jemalloc, then build normally — it is picked up automatically
sudo apt install libjemalloc-dev
make -f Makefile.fkvs setup-and-build
# Force on/off anywhere (e.g. opt in on macOS, opt out on Linux)
cmake -S . -B . -DFKVS_ENABLE_JEMALLOC=ON # or =OFF
cmake --build .On macOS (and other non-Linux platforms) jemalloc is opt-in. See the jemalloc Build Guide for verification, Docker, and when it is worth enabling.
$ ./fkvs-cli
Connected to server on port 5995
Type 'exit' to quit.
127.0.0.1:5995> PING google.com
"google.com"$ ./fkvs-cli -h 127.0.0.1 -p 5995 --non-interactive
"PONG"| Command | Usage | Description |
|---|---|---|
SET |
SET key value [EX seconds] |
Store a key-value pair, optionally setting a TTL atomically |
GET |
GET key |
Retrieve the value of a key |
DEL |
DEL key |
Delete a key |
INCR |
INCR key |
Increment the integer value of a key by 1 |
INCRBY |
INCRBY key amount |
Increment the integer value of a key by a given amount |
DECR |
DECR key |
Decrement the integer value of a key by 1 |
DECRBY |
DECRBY key amount |
Decrement the integer value of a key by a given amount |
| Command | Usage | Description |
|---|---|---|
EXPIRE |
EXPIRE key seconds |
Set a timeout on a key (in seconds) |
TTL |
TTL key |
Get the remaining time-to-live of a key (-1 = no TTL, -2 = key missing) |
PERSIST |
PERSIST key |
Remove the timeout from a key |
| Command | Usage | Description |
|---|---|---|
PING |
PING or PING value |
Test connectivity; returns PONG or echoes the value |
INFO |
INFO |
Display server statistics (uptime, memory, connected clients) |
KEYS |
KEYS |
List all non-expired stored keys |
These figures measure server-side CPU efficiency — fkvs-benchmark talks to
the server over loopback TCP on a single machine, so they show how many
operations one fkvs event-loop core can push, not end-to-end latency over a real
network.
Environment
| Host | Apple M1 Max — 8 performance + 2 efficiency cores, macOS 26.5 |
| Linux | Ubuntu 24.04 LTS in an OrbStack VM — 10 vCPU, 16 GiB, Linux 6.19, aarch64 |
| Build | -DCMAKE_BUILD_TYPE=Release, branch feat/jemalloc-support (with SET hot-path optimizations) |
| Allocators | jemalloc 5.3.0 vs system (glibc) malloc |
| Workload | fkvs-benchmark, loopback TCP, SET (5-byte value) |
| Method | best-of-2 per cell (best-of-3 for -r); -n 1,000,000 (-n 300,000 at -P 1) |
The Linux numbers come from a virtualized VM on Apple Silicon, not bare-metal Linux, and loopback TCP removes the network — treat them as relative, not absolute. Run-to-run variance is ~10–20% on this shared VM.
fkvs-benchmark -n 1000000 -t set -c 128 -P <P> — req/s, higher is better:
Pipeline (-P) |
epoll · jemalloc | epoll · system | io_uring · jemalloc | io_uring · system |
|---|---|---|---|---|
| 1 | 197K | 230K | 231K | 242K |
| 8 | 1.01M | 985K | 984K | 966K |
| 32 | 1.78M | 1.96M | 1.87M | 1.84M |
| 128 | 2.88M | 2.94M | 2.88M | 2.92M |
At -P 128 the single event-loop core sustains ~2.9M req/s; with more
connections and on quieter runs it peaks around 3.1–3.5M. Connection count
(-c) otherwise had only marginal effect (50–256 within ~10%): the
single-threaded loop is already saturated by a handful of pipelined clients.
fkvs-benchmark -n 1000000 -t set -r -c 50 -P 128, fresh server per run — req/s:
| Backend | jemalloc | system |
|---|---|---|
| epoll | 1.64M | 1.69M |
| io_uring | 1.60M | 1.59M |
- Pipelining is the dominant lever — ~15× from
-P 1to-P 128. With deep pipelining the workload is CPU/parse-bound on a single core (~2.9M ops/s, peaks ~3.5M). - epoll ≈ io_uring — within noise across the board; io_uring only pulls
marginally ahead at
-P 1, its syscall-bound sweet spot. - jemalloc ≈ system on the optimized hot path. Earlier builds allocated
several small objects per
SET, which gave jemalloc an edge; once the hot path was trimmed to near-zero per-op allocation, that edge largely disappeared. jemalloc still does no harm and remains the default on Linux (it can help long-running, fragmentation-prone workloads) — see the jemalloc Build Guide. -ris lower (~1.6M) because every op inserts a new key — real allocation plus cache-cold access across a large keyspace — versus the cache-hot single-key update path above.
- Benchmarking Guide - How to benchmark FKVS performance
- jemalloc Build Guide - Building the server with jemalloc
- Profiling Guide - Profiling with Instruments on macOS
- Performance Roadmap - Path to 1M req/s optimization plan
- Event Dispatchers - Event loop implementations
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Copyright (c) 2025 Alexandre Juca - corextechnologies@gmail.com
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.