Skip to content

efficient/sortbloom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

343 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sortbloom

A repository exploring the cost-benefit of sorting accesses to a bloom filter for high-selectivity filtering workloads

To run, you'll need to have hugepages configured:

echo 400 > /proc/sys/vm/nr_hugepages

(depending on the size of filter you plan to test)

To extract LLC misses, run: sudo perf stat -e LLC-load-misses ./target/release/sortbloom --filter-size=268435456 --num-entries=16777216 --probe-count=134217728

-- This might be outdated. (with -s or -u to measure sorted or unsorted). To estimate misses per lookup, take a number of probes N, and calculate:

misses_2N - misses_N
--------------------
       N

which will remove the misses incurred in filling the hash table

Running Benchmarks (driver.py)

  • Build the binary first: make release (produces ./target/release/sortbloom).

  • Single run via Rust binary:

    • ./target/release/sortbloom --filter-size=268435456 --num-entries=16777216 --probe-count=134217728 --hasher fast --sort full-sort
  • Create an output directory: mkdir -p results.

  • Sweep via driver:

    • python3 driver.py --experiment "cmp fast+mod" --out-dir results --low-bits 21 --high-bits 22 --binary ./target/release/sortbloom --hasher fast,mod
  • What it does:

    • Creates results-XXXX.jsonl in --out-dir (next available number).
    • Writes a first-line run header with experiment, run_id, ts, git, host, and rustc_version.
    • Appends one JSON object per configuration from the binary (args + bf_results).
    • Updates latest.jsonl symlink to point to the newest results file.

Governing Processor frequency

On a modern processors, the frequency at which the core runs has many impacts on the performance. For our experiments to be "fair" it is important that all implementation run at the same CPU frequency.

Without "pinning" the CPU frequency to a value, we observed that doing more AVX2 work clocks CPU frequency higher than doing less work.

As a concrete example, partial-lut2-0-0-16 runs at ~4.1GHz and no-sort runs at ~3.6GHz on Dave's office machine.

Our benchmark is usually memory bottlenecked. Since the work required for no-sort is very short (~1.7ns per 8 memory probes), we believe that frequency is kept low because calcualting faster is not going to help, we will still bottleneck by the memory (that's what CPU thinks). But calculating faster still does have some benefits.

On a linux platform, following commands might be useful for controlling CPU frequencies.

# check the current state of CPUs
sudo cpupower frequency-info

# Set all CPU to a fixed frequency
# The -g means governer, the same governer may not exist for you.
sudo cpupower frequency-set -g performance --min 4.0G --max 4.0G

# reset the CPU frequencies
sudo cpupower frequency-set -g powersave --min 1.2G --max 4.3G

FPR for different designs

My output for the runs is set in

# This runs it for 1024 iteration, number of iteration could be changed in src/fpr.rs
# Filter size is 16 * M * Bytes 
# Total items are 8 * M
# Total queries are 128 * M
./target/release/sortbloom --filter-size=134217728 --num-entries=8388608 --probe-count=134217728 --l3=8 --fpr > fpr.txt

# Run this script to get FPR in (%) for each designs

Get the plots for the paper

./prepare_paper_plots.sh results/plots/final_variants_2/ results/plots/final_variants_2_cache results/plots/final_workload_2

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors