0
0
mirror of https://github.com/tursodatabase/libsql.git synced 2024-11-23 10:56:17 +00:00
libsql/libsql-sqlite3/benchmark
2024-08-13 11:11:18 +04:00
..
.gitignore add benchmark scripts (#1546) 2024-07-12 10:04:59 +00:00
anntest.c add benchmark scripts (#1546) 2024-07-12 10:04:59 +00:00
benchtest.c account reads and writes from ANN search in LIBSQL_STMTSTATUS_ROWS_READ and LIBSQL_STMTSTATUS_ROWS_WRITTEN counters 2024-07-29 11:56:09 +04:00
blobtest.c add benchmark scripts (#1546) 2024-07-12 10:04:59 +00:00
Makefile add benchmark scripts (#1546) 2024-07-12 10:04:59 +00:00
README.md add benchmark scripts (#1546) 2024-07-12 10:04:59 +00:00
requirements.txt add benchmark scripts (#1546) 2024-07-12 10:04:59 +00:00
workload.py generate uniform values from [-1..1] in benchmark workloads 2024-08-13 11:11:18 +04:00

benchmarks tools

Simple benchmark tools intentionally written in C in order to have faster feedback loops (no need to wait for Rust builds)

You need to install numpy for some scripts to work. You can do it globally or using virtual env:

$> python -m venv .env
$> source .env/bin/activate
$> pip install -r requirements.txt

benchtest

Simple generic tool which takes SQL file, db file and run all queries against provded DB file. For SQL file generation you can use/extend workload.py script.

Take a look at the example:

$> LD_LIBRARY_PATH=../.libs/ ./benchtest queries.sql data.db
open queries file at queries.sql
open sqlite db at 'data.db'
executed simple statement: 'CREATE TABLE t ( id INTEGER PRIMARY KEY, emb FLOAT32(4) );'
executed simple statement: 'CREATE INDEX t_idx ON t ( libsql_vector_idx(emb) );'
prepared statement: 'INSERT INTO t VALUES ( ?, vector(?) );'
inserts (queries.sql):
  insert: 710.25 micros (avg.), 4 (count)
  size  : 0.2695 MB
  reads : 1.00 (avg.), 4 (total)
  writes: 1.00 (avg.), 4 (total)
prepared statement: 'SELECT * FROM vector_top_k('t_idx', vector(?), ?);'
search (queries.sql):
  select: 63.25 micros (avg.), 4 (count)
  size  : 0.2695 MB
  reads : 1.00 (avg.), 4 (total)

It is linked against liblibsql.so which resides in the ../libs/ directory and must be explicitly built from libsql-sqlite3 sources:

$> basename $(pwd)
libsql-sqlite3
$> make # this command will generate libs in the .libs directory
$> cd benchmark
$> make bruteforce
open queries file at bruteforce.sql
open sqlite db at 'test.db'
executed simple statement: 'PRAGMA journal_mode=WAL;'
executed simple statement: 'CREATE TABLE x ( id INTEGER PRIMARY KEY, embedding FLOAT32(64) );'
prepared statement: 'INSERT INTO x VALUES (?, vector(?));'
inserts (bruteforce.sql):
  insert: 46.27 micros (avg.), 1000 (count)
  size  : 0.2695 MB
  reads : 1.00 (avg.), 1000 (total)
  writes: 1.00 (avg.), 1000 (total)
prepared statement: 'SELECT id FROM x ORDER BY vector_distance_cos(embedding, vector(?)) LIMIT ?;'
search (bruteforce.sql):
  select: 329.32 micros (avg.), 1000 (count)
  size  : 0.2695 MB
  reads : 2000.00 (avg.), 2000000 (total)

anntest

Simple tool which takes DB file with 2 tables data (id INTEGER PRIMARY KEY, emb FLOAT32(n)) and queries (emb FLOAT32(n)) and execute vector search for all vectors in queries table abainst data table using provided SQL statements.

In order to generate DB file you can use benchtest with workload.py tools. Take a look at the example:

$> python workload.py recall_uniform 64 1000 1000 > recall_uniform.sql
$> LD_LIBRARY_PATH=../.libs/ ./benchtest recall_uniform.sql recall_uniform.db
$> # ./anntext [db path] [test name (used only for printed stats)] [ann query] [exact query]
$> LD_LIBRARY_PATH=../.libs/ ./anntest recall_uniform.db 10-recall@10 "SELECT rowid FROM vector_top_k('data_idx', ?, 10)" "SELECT id FROM data ORDER BY vector_distance_cos(emb, ?) LIMIT 10"
open sqlite db at 'recall_uniform.db'
ready to perform 1000 queries with SELECT rowid FROM vector_top_k('data_idx', ?, 10) ann query and SELECT id FROM data ORDER BY vector_distance_cos(emb, ?) LIMIT 10 exact query
88.91% 10-recall@10 (avg.)

blobtest

Simple tool which aims to prove that sqlite3_blob_reopen API can substantially increase performance of reads.

Take a look at the example:

$> LD_LIBRARY_PATH=../.libs/ ./blobtest blob-read-simple.db read simple 1000 1000
open sqlite db at 'blob-read-simple.db'
blob table: ready to prepare
blob table: prepared
time: 3.76 micros (avg.), 1000 (count)
$> LD_LIBRARY_PATH=../.libs/ ./blobtest blob-read-reopen.db read reopen 1000 1000
open sqlite db at 'blob-read-reopen.db'
blob table: ready to prepare
blob table: prepared
time: 0.31 micros (avg.), 1000 (count)