0
0
mirror of https://github.com/tursodatabase/libsql.git synced 2025-01-08 18:49:04 +00:00
libsql/docs/USER_GUIDE.md
2024-11-13 10:59:29 -05:00

261 lines
9.4 KiB
Markdown

# `sqld` User Guide
Welcome to the `sqld` user guide!
## Table of Contents
* [Overview](#overview)
* [Replication](#replication)
* [TLS configuration](#tls-configuration)
* [Launching a primary server](#launching-a-primary-server)
* [Launching a replica server](#launching-a-replica-server)
* [Client Authentication](#clientauthentication)
* [Deployment](#deployment)
* [Deploying with Docker](#deploying-with-docker)
* [Deploying on Fly](#deploying-on-fly)
## Overview
The `sqld` program provides libsql over HTTP and supports transparent replication.
![libsql cluster overview.](sqld-overview.png)
<p align="center">
Figure 1. Overview of libsql clustering.
</p>
Figure 1 shows an overview of libsql cluster where clients execute SQL remotely over HTTP against `sqld` instances.
In the middle, there is the _primary_ instance, which is responsible for accepting writes and servicing replicas for write-ahead log (WAL) updates.
If a client performs a write operation such as `INSERT` statement in SQL, replicas delegate the write to a primary node.
Read operations, such as `SELECT` statements, however, are executed on the replica directly.
The replicas poll the primary instance for WAL updates periodically over a gRPC connection.
## Replication
In this section, we will walk you through how to set up a libsql cluster.
### TLS configuration
The nodes in a `sqld` cluster communicate over gRPC with TLS. To set up a `sqld` cluster, you need the following TLS configuration:
* Certificate authority (CA) certificate and private key
* Primary server certificate and private key
* Replica server certificates and private keys
In TLS speak, the primary server is the server and the replica servers are the clients.
For *development and testing* purposes, you can generate TLS keys and certificates with:
```console
python scripts/gen_certs.py
```
The script generates the following files:
* `ca_cert.pem` -- certificate authority certificate
* `ca_key.pem` -- certificate authority private key
* `server_cert.pem` -- primary server certificate
* `server_key.pem` -- primary server private key
* `client_cert.pem` -- replica server certificate
* `client_key.pem ` -- replica server private key
### Launching a primary server
To start a `sqld` server in primary mode, run:
```console
sqld \
--http-listen-addr 127.0.0.1:8081 \
--grpc-listen-addr 127.0.0.1:5001 \
--grpc-tls \
--grpc-ca-cert-file ca_cert.pem \
--grpc-cert-file server_cert.pem \
--grpc-key-file server_key.pem
```
You now have a `sqld` primary server listening to SQL over HTTP at `127.0.0.1:8081` and gRPC with TLS at `127.0.0.1:5001`.
### Launching a replica server
To start a a `sqld` server in replica mode, run:
```console
sqld \
--http-listen-addr 127.0.0.1:8082 \
--primary-grpc-url https://127.0.0.1:5001 \
--primary-grpc-tls \
--primary-grpc-ca-cert-file ca_cert.pem \
--primary-grpc-cert-file client_cert.pem \
--primary-grpc-key-file client_key.pem
```
You now have a `sqld` replica server listening to SQL over HTTP at `127.0.0.1:8082`, which is connected to a primary server at `127.0.0.1:5001`.
You can add more replicas to the cluster by just starting more `sqld` processes. However, it's recommended that you generate a different TLS configuration for every replica.
To test the cluster, you can, for example, create a table and insert rows in the replica:
```console
curl -d '{"statements": ["CREATE TABLE IF NOT EXISTS users (username)", "INSERT INTO users VALUES (\"alice\")"]}' 127.0.0.1:8082
```
and query the results from the primary:
```console
curl -d '{"statements": ["SELECT * FROM users"]}' 127.0.0.1:8081
```
## Client Authentication
You can configure client authentication by passing the `--auth-jwt-key-file FILENAME` command line option to `sqld`.
The key is either a PKCS#8-encoded Ed25519 public key in PEM, or just plain bytes of the Ed25519 public key in URL-safe base64.
## Deployment
### Deploying with Docker
You can pull the official Docker image for `sqld` with
```console
docker pull ghcr.io/libsql/sqld:main
```
You can find more information about the Docker image [here](./DOCKER.md).
### Deploying on Fly
You can use the existing `fly.toml` file from this repository.
Just run
```console
flyctl launch
```
... then pick a name and respond "Yes" when the prompt asks you to deploy.
You now have `sqld` running on Fly listening for HTTP connections.
Give it a try with this snippet, replacing `$YOUR_APP` with your app name:
```
curl -X POST -d '{"statements": ["create table testme(a,b,c)"]}' $YOUR_APP.fly.dev
curl -X POST -d '{"statements": ["insert into testme values(1,2,3)"]}' $YOUR_APP.fly.dev
curl -X POST -d '{"statements": ["select * from testme"]}' $YOUR_APP.fly.dev
```
```
[{"b":2,"a":1,"c":3}]
```
## Incremental snapshots
The `sqld` generates incremental snapshots of the database file, which you can apply to a local libSQL replica.
For example, suppose you have an application that is not always connected over the network and can't rely on the `sqld` gRPC replication method. In that case, you can configure `sqld` to notify of generated incremental snapshots, sync the snapshot files to another machine, and apply them.
You can use the `--snapshot-exec` command line option to specify a file, such as a shell script, to execute on snapshot generation. You can also use the `--max-log-duration SECS` command line option
on to control how often `sqld` generates the snapshot files to ensure the freshness of the data on local replicas.
To use incremental snapshots, first, create a shell script with the name `snapshot.sh`:
```bash
#!/bin/bash
SNAPSHOT_FILE="$1"
NAMESPACE="$2"
echo "Generated incremental snapshot $SNAPSHOT_FILE for namespace $NAMESPACE"
# At this point we can ship the snapshot file to whereever we would like but we
# must delete it from its location on disk or else sqld will panic.
rm $SNAPSHOT_FILE
```
and then configure `sqld` to generate an incremental snapshot every 5 seconds and invoke the shell script when `sqld` generates a snapshot:
```console
sqld --snapshot-exec ./snapshot.sh --max-log-duration 5
```
When you write to the `sqld` database, you will eventually see log line such as:
```console
2023-08-11T08:21:04.183564Z INFO sqld::replication::snapshot: snapshot `e126f594-90f4-45be-9350-bc8a01160de9-0-2.snap` successfully created
Generated incremental snapshot data.sqld/dbs/default/snapshots/e126f594-90f4-45be-9350-bc8a01160de9-0-2.snap
```
The first line is logging from `sqld` and the second line is `sqld` executing `snapshot.sh` script.
You can now, for example, `rsync` the snapshot file to another machine, to apply the changes to a local replica with the `Database::sync_frames()` method of the `libsql` crate:
```rust
use libsql::Database;
use libsql_replication::{Frames, TempSnapshot};
#[tokio::main]
async fn main() {
tracing_subscriber::fmt::init();
let opts = libsql::Opts::with_sync();
let db = Database::open_with_opts("test.db", opts).await.unwrap();
let conn = db.connect().unwrap();
let args = std::env::args().collect::<Vec<String>>();
if args.len() < 2 {
println!("Usage: {} <snapshot path>", args[0]);
return;
}
let snapshot_path = args.get(1).unwrap();
let snapshot = TempSnapshot::from_snapshot_file(snapshot_path.as_ref()).unwrap();
db.sync_frames(Frames::Snapshot(snapshot)).unwrap();
let rows = conn
.query("SELECT * FROM sqlite_master", ())
.unwrap()
.unwrap();
while let Ok(Some(row)) = rows.next() {
println!(
"| {:024} | {:024} | {:024} | {:024} |",
row.get::<&str>(0).unwrap(),
row.get::<&str>(1).unwrap(),
row.get::<&str>(2).unwrap(),
row.get::<&str>(3).unwrap(),
);
}
}
```
When applying snapshots the format of the file name gives certain information.
The format is `{namespace}:{log_id}:{start_frame_no:020x}-{end_frame_no:020x}.snap` where log_id represents the unqiue write ahead log and then
for each unique log_id there will be snapshots starting at frame `0` up until
the end. Snapshots must be applied sequentially for each log_id starting at
frame 0.
## Multitenancy
The `sqld` server supports more than one database. To create a database, send a create namespace request to the [admin API](ADMIN_API.md).
For example, to create a database named `db1`, send the following HTTP request:
```shell
curl -X POST http://localhost:8080/v1/namespaces/db1/create
```
The name of the database is determined from the `Host` header in the HTTP request.
For example, if you have the following entries in your `/etc/hosts` file:
```console
127.0.0.1 db1.local
127.0.0.1 db2.local
```
You can access `db1` with the `http://db1.local:8080`URL and `db2` with `http://db2.local:8080`.
The database files for the databases are stored in `<data dir>/dbs/db1` and `<data dir/dbs/db2`, respectively.
### Path based routing
For local development and testing, you can also connect to your databases by specifying the database namespace in the URL path.
You can access namespace `db1` with the `http://local:8080/dev/db1` URL and `db2` with `http://local:8080/dev/db2`, respectively.
### Wildcard domain for development
If you don't feel like editing `/etc/hosts` each time you want to test a new namespace, you can use any domain
that returns `127.0.0.1` for all subdomains. One of such domains is `*.db.sarna.dev`, which you can use to access
your local `db1` and `db2` as `http://db1.db.sarna.dev` and `http://db2.db.sarna.dev` respectively.