0
0
mirror of https://github.com/tursodatabase/libsql.git synced 2025-01-05 22:07:56 +00:00
Sivukhin Nikita 4b21878187 add benchmark scripts (#1546)
* add benchmark scripts

* remove unnecessary srand
2024-07-12 10:04:59 +00:00

3.7 KiB

benchmarks tools

Simple benchmark tools intentionally written in C in order to have faster feedback loops (no need to wait for Rust builds)

You need to install numpy for some scripts to work. You can do it globally or using virtual env:

$> python -m venv .env
$> source .env/bin/activate
$> pip install -r requirements.txt

benchtest

Simple generic tool which takes SQL file, db file and run all queries against provded DB file. For SQL file generation you can use/extend workload.py script.

Take a look at the example:

$> LD_LIBRARY_PATH=../.libs/ ./benchtest queries.sql data.db
open queries file at queries.sql
open sqlite db at 'data.db'
executed simple statement: 'CREATE TABLE t ( id INTEGER PRIMARY KEY, emb FLOAT32(4) );'
executed simple statement: 'CREATE INDEX t_idx ON t ( libsql_vector_idx(emb) );'
prepared statement: 'INSERT INTO t VALUES ( ?, vector(?) );'
inserts (queries.sql):
  insert: 710.25 micros (avg.), 4 (count)
  size  : 0.2695 MB
  reads : 1.00 (avg.), 4 (total)
  writes: 1.00 (avg.), 4 (total)
prepared statement: 'SELECT * FROM vector_top_k('t_idx', vector(?), ?);'
search (queries.sql):
  select: 63.25 micros (avg.), 4 (count)
  size  : 0.2695 MB
  reads : 1.00 (avg.), 4 (total)

It is linked against liblibsql.so which resides in the ../libs/ directory and must be explicitly built from libsql-sqlite3 sources:

$> basename $(pwd)
libsql-sqlite3
$> make # this command will generate libs in the .libs directory
$> cd benchmark
$> make bruteforce
open queries file at bruteforce.sql
open sqlite db at 'test.db'
executed simple statement: 'PRAGMA journal_mode=WAL;'
executed simple statement: 'CREATE TABLE x ( id INTEGER PRIMARY KEY, embedding FLOAT32(64) );'
prepared statement: 'INSERT INTO x VALUES (?, vector(?));'
inserts (bruteforce.sql):
  insert: 46.27 micros (avg.), 1000 (count)
  size  : 0.2695 MB
  reads : 1.00 (avg.), 1000 (total)
  writes: 1.00 (avg.), 1000 (total)
prepared statement: 'SELECT id FROM x ORDER BY vector_distance_cos(embedding, vector(?)) LIMIT ?;'
search (bruteforce.sql):
  select: 329.32 micros (avg.), 1000 (count)
  size  : 0.2695 MB
  reads : 2000.00 (avg.), 2000000 (total)

anntest

Simple tool which takes DB file with 2 tables data (id INTEGER PRIMARY KEY, emb FLOAT32(n)) and queries (emb FLOAT32(n)) and execute vector search for all vectors in queries table abainst data table using provided SQL statements.

In order to generate DB file you can use benchtest with workload.py tools. Take a look at the example:

$> python workload.py recall_uniform 64 1000 1000 > recall_uniform.sql
$> LD_LIBRARY_PATH=../.libs/ ./benchtest recall_uniform.sql recall_uniform.db
$> # ./anntext [db path] [test name (used only for printed stats)] [ann query] [exact query]
$> LD_LIBRARY_PATH=../.libs/ ./anntest recall_uniform.db 10-recall@10 "SELECT rowid FROM vector_top_k('data_idx', ?, 10)" "SELECT id FROM data ORDER BY vector_distance_cos(emb, ?) LIMIT 10"
open sqlite db at 'recall_uniform.db'
ready to perform 1000 queries with SELECT rowid FROM vector_top_k('data_idx', ?, 10) ann query and SELECT id FROM data ORDER BY vector_distance_cos(emb, ?) LIMIT 10 exact query
88.91% 10-recall@10 (avg.)

blobtest

Simple tool which aims to prove that sqlite3_blob_reopen API can substantially increase performance of reads.

Take a look at the example:

$> LD_LIBRARY_PATH=../.libs/ ./blobtest blob-read-simple.db read simple 1000 1000
open sqlite db at 'blob-read-simple.db'
blob table: ready to prepare
blob table: prepared
time: 3.76 micros (avg.), 1000 (count)
$> LD_LIBRARY_PATH=../.libs/ ./blobtest blob-read-reopen.db read reopen 1000 1000
open sqlite db at 'blob-read-reopen.db'
blob table: ready to prepare
blob table: prepared
time: 0.31 micros (avg.), 1000 (count)