0
0
mirror of https://github.com/tursodatabase/libsql.git synced 2025-01-18 09:31:51 +00:00
libsql/bottomless
Lucio Franco d95fe5d4ee sqld: disable checkpoint on primary conn create
This commit changes our primary connection initialization code in two
ways to achieve the ability to disable checkpointing the wal.

1) We ignore the initial checkpoint that we call directly into sqlite3
   before we restore from bottomless. There is a fixme above that
   explains why we need this but to me right now its not totally clear
   why without digging deeper into the internals of bottomless. We
   should do this but for the moment this unblocks us and from the fixme
   comment it does not sound unsafe rather doing extra work potentially.

2) When bottomless needs to get the local change counter is creates a
   sqlite connection. When this connection drops it seems like it
   checkpoints the wal. I took a brief look at the `sqlite3_close` code
   and did not find anything obvious, I have been told in the past that
   sqlite3 likes to checkpoint at weird points so this could be one of
   those. For the moment, the temporary fix like above is to
   `std::mem::forget` the connection so that `Drop` never gets called
   and thus `sqlite3_close` never gets called.

With both of these changes we now don't checkpoint the wal unless we hit
the max size or interval (which for testing I have set very high). This
changes are not enabled by default but must be enabled by setting the
following env var:

```
LIBSQL_DISABLE_INIT_CHECKPOINTING=1
LIBSQL_BOTTOMLESS_DISABLE_INIT_CHECKPOINTING=1
```
2025-01-06 11:59:57 -05:00
..
2023-10-17 17:41:26 +02:00
2023-10-17 17:41:26 +02:00
2024-08-23 12:35:33 +04:00
2023-10-17 17:41:26 +02:00
2023-12-17 13:12:05 +05:30

Bottomless S3-compatible virtual WAL for libSQL

Work in heavy progress!

This project implements a virtual write-ahead log (WAL) which continuously backs up the data to S3-compatible storage and is able to restore it later.

How to build

LIBSQL_DIR=/path/to/your/libsql/directory make

will produce a loadable .so libSQL extension with bottomless WAL implementation.

LIBSQL_DIR=/path/to/your/libsql/directory make release

will do the same, but for release mode.

Configuration

By default, the S3 storage is expected to be available at http://localhost:9000 (e.g. a local development minio server), and the auth information is extracted via regular S3 SDK mechanisms, i.e. environment variables and ~/.aws/credentials file, if present. Ref: https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/guide_credentials_environment.html

Default endpoint can be overridden by an environment variable too, and in the future it will be available directly from libSQL as an URI parameter:

export LIBSQL_BOTTOMLESS_ENDPOINT='http://localhost:9042'

Bucket used for replication can be configured with:

export LIBSQL_BOTTOMLESS_BUCKET='custom-bucket'

On top of that, bottomless is implemented on top of the official Rust SDK for S3, so all AWS-specific environment variables like AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY also work, as well as the ~/.aws/credentials file.

How to use

From libSQL shell, load the extension and open a database file with bottomless WAL, e.g.:

.load ../target/debug/bottomless
.open file:test.db?wal=bottomless
PRAGMA journal_mode=wal;

Remember to set the journaling mode to WAL, which needs to be done at least once, before writing any content, otherwise the custom WAL implementation will not be used.

In order to customize logging, use RUST_LOG env variable, e.g. RUST_LOG=info ./libsql.

A short demo script is in test/smoke_test.sh, and can be executed with:

LIBSQL_DIR=/path/to/your/libsql/directory make test

CLI

The command-line interface supports browsing, restoring and removing snapshot generations. It can be installed as a standalone executable with:

RUSTFLAGS="--cfg uuid_unstable" cargo install bottomless-cli

Alternatively, bottomless-cli is available from the repository by running cargo run. Available commands:

$ bottomless-cli --help
Bottomless CLI

Usage: bottomless-cli [OPTIONS] <COMMAND>

Commands:
  ls       List available generations
  restore  Restore the database
  rm       Remove given generation from remote storage
  help     Print this message or the help of the given subcommand(s)

Options:
  -e, --endpoint <ENDPOINT>  
  -b, --bucket <BUCKET>      
  -d, --database <DATABASE>  
  -h, --help                 Print help information

Examples

Listing generations

[sarna@sarna-pc test]$ bottomless-cli -e http://localhost:9000 ls -v -l3
e4eb3c21-ff53-7b2e-a6ea-ca396f4df9b1
	created at (UTC):     2022-12-23 08:24:52.500
	change counter:       [0, 0, 0, 51]
	consistent WAL frame: 0
	WAL frame checksum:   0
	main database snapshot:
		object size:   408
		last modified: 2022-12-23T08:24:53Z

e4eb3c22-0359-7af6-9acb-285ed7b6ed59
	created at (UTC):     2022-12-23 08:24:51.470
	change counter:       [0, 0, 0, 51]
	consistent WAL frame: 1
	WAL frame checksum:   5335f2a044d2f455
	main database snapshot:
		object size:   399
		last modified: 2022-12-23T08:24:52Z

e4eb3c22-0941-73eb-85df-4e8552a0e88c
	created at (UTC):     2022-12-23 08:24:49.958
	change counter:       [0, 0, 0, 50]
	consistent WAL frame: 10
	WAL frame checksum:   6ac65882f9a2dba7
	main database snapshot:
		object size:   401
		last modified: 2022-12-23T08:24:51Z

Restoring the database

$ RUST_LOG=info bottomless-cli -e http://localhost:9000 restore
2022-12-23T10:16:10.703557Z  INFO bottomless::replicator: Bucket bottomless exists and is accessible
2022-12-23T10:16:10.709526Z  INFO bottomless_cli: Database: test.db
2022-12-23T10:16:10.713070Z  INFO bottomless::replicator: Restoring from generation e4eb3c29-fe84-7347-a0c0-b9a3a71d0fc2
2022-12-23T10:16:10.727646Z  INFO bottomless::replicator: Restored the main database file

Removing old snapshots

$ bottomless-cli -e http://localhost:9000 rm -v --older-than 2022-12-15
Removed 4 generations

Details

All page writes committed to the database end up being asynchronously replicated to S3-compatible storage. On boot, if the main database file is empty, it will be restored with data coming from the remote storage. If the database file is newer, it will be uploaded to the remote location with a new generation number. If a local WAL file is present and detected to be newer than remote data, it will be uploaded as well.

Tests

A fully local test can be performed by using a local S3-compatible server, e.g. Minio. Assuming the server is available at HTTP port 9000, you can use the following scripts:

cd test/
export LIBSQL_BOTTOMLESS_ENDPOINT=http://localhost:9000
./smoke_test.sh
./restore_test.sh

The smoke_test script sets up a new database in WAL mode and 64KiB page size - test.db - and then inserts a few records into the database. The restore_test script syncs with the replication server and fetches the newest database if necessary. Once smoke_test ran at least once, restore_test should always be able to fetch the database data, even if the local test.db file is removed.

The same set of tests also work with remote servers. In case of AWS S3, just make sure that the AWS SDK credentials are valid and the user has permissions for managing the chosen bucket.