* add benchmark scripts * remove unnecessary srand
benchmarks tools
Simple benchmark tools intentionally written in C in order to have faster feedback loops (no need to wait for Rust builds)
You need to install numpy
for some scripts to work. You can do it globally or using virtual env:
$> python -m venv .env
$> source .env/bin/activate
$> pip install -r requirements.txt
benchtest
Simple generic tool which takes SQL file, db file and run all queries against provded DB file.
For SQL file generation you can use/extend workload.py
script.
Take a look at the example:
$> LD_LIBRARY_PATH=../.libs/ ./benchtest queries.sql data.db
open queries file at queries.sql
open sqlite db at 'data.db'
executed simple statement: 'CREATE TABLE t ( id INTEGER PRIMARY KEY, emb FLOAT32(4) );'
executed simple statement: 'CREATE INDEX t_idx ON t ( libsql_vector_idx(emb) );'
prepared statement: 'INSERT INTO t VALUES ( ?, vector(?) );'
inserts (queries.sql):
insert: 710.25 micros (avg.), 4 (count)
size : 0.2695 MB
reads : 1.00 (avg.), 4 (total)
writes: 1.00 (avg.), 4 (total)
prepared statement: 'SELECT * FROM vector_top_k('t_idx', vector(?), ?);'
search (queries.sql):
select: 63.25 micros (avg.), 4 (count)
size : 0.2695 MB
reads : 1.00 (avg.), 4 (total)
It is linked against liblibsql.so which resides in the ../libs/
directory and must be explicitly built from libsql-sqlite3
sources:
$> basename $(pwd)
libsql-sqlite3
$> make # this command will generate libs in the .libs directory
$> cd benchmark
$> make bruteforce
open queries file at bruteforce.sql
open sqlite db at 'test.db'
executed simple statement: 'PRAGMA journal_mode=WAL;'
executed simple statement: 'CREATE TABLE x ( id INTEGER PRIMARY KEY, embedding FLOAT32(64) );'
prepared statement: 'INSERT INTO x VALUES (?, vector(?));'
inserts (bruteforce.sql):
insert: 46.27 micros (avg.), 1000 (count)
size : 0.2695 MB
reads : 1.00 (avg.), 1000 (total)
writes: 1.00 (avg.), 1000 (total)
prepared statement: 'SELECT id FROM x ORDER BY vector_distance_cos(embedding, vector(?)) LIMIT ?;'
search (bruteforce.sql):
select: 329.32 micros (avg.), 1000 (count)
size : 0.2695 MB
reads : 2000.00 (avg.), 2000000 (total)
anntest
Simple tool which takes DB file with 2 tables data (id INTEGER PRIMARY KEY, emb FLOAT32(n))
and queries (emb FLOAT32(n))
and execute vector search for all vectors in queries
table abainst data
table using provided SQL statements.
In order to generate DB file you can use benchtest
with workload.py
tools. Take a look at the example:
$> python workload.py recall_uniform 64 1000 1000 > recall_uniform.sql
$> LD_LIBRARY_PATH=../.libs/ ./benchtest recall_uniform.sql recall_uniform.db
$> # ./anntext [db path] [test name (used only for printed stats)] [ann query] [exact query]
$> LD_LIBRARY_PATH=../.libs/ ./anntest recall_uniform.db 10-recall@10 "SELECT rowid FROM vector_top_k('data_idx', ?, 10)" "SELECT id FROM data ORDER BY vector_distance_cos(emb, ?) LIMIT 10"
open sqlite db at 'recall_uniform.db'
ready to perform 1000 queries with SELECT rowid FROM vector_top_k('data_idx', ?, 10) ann query and SELECT id FROM data ORDER BY vector_distance_cos(emb, ?) LIMIT 10 exact query
88.91% 10-recall@10 (avg.)
blobtest
Simple tool which aims to prove that sqlite3_blob_reopen
API can substantially increase performance of reads.
Take a look at the example:
$> LD_LIBRARY_PATH=../.libs/ ./blobtest blob-read-simple.db read simple 1000 1000
open sqlite db at 'blob-read-simple.db'
blob table: ready to prepare
blob table: prepared
time: 3.76 micros (avg.), 1000 (count)
$> LD_LIBRARY_PATH=../.libs/ ./blobtest blob-read-reopen.db read reopen 1000 1000
open sqlite db at 'blob-read-reopen.db'
blob table: ready to prepare
blob table: prepared
time: 0.31 micros (avg.), 1000 (count)