dbx

Reference documentation for dbx — database backup, encryption, restore, scheduling, and cloud upload.

Every database backup workflow I've used eventually breaks down in the same place: you need the database client tools installed locally, you need to remember the right flags, and if the database is behind a bastion host, you're stitching together SSH tunnels by hand. Then you compress the dump, maybe encrypt it, upload it somewhere, and hope you can reverse the process six months later when you actually need it.

I built dbx to make this boring. It's a single Bash CLI that handles the full lifecycle: backup, encrypt, upload, restore, schedule, notify. No local database installation required. It uses Docker containers for all database operations, so your machine never needs pg_dump or mysqldump installed.

The core idea

dbx runs database tools inside Docker instead of requiring them on the host. When you back up a PostgreSQL database, it spins up a postgres:17-alpine container, runs pg_dump inside it, and pipes the output through compression and encryption to a local file. When you restore, it creates a fresh database inside a local Docker container and loads the backup into it. You get a working copy of a production database on your laptop without installing anything beyond Docker.

This matters most when you're dealing with remote databases. An RDS instance behind a VPC, accessed through a bastion host, with credentials in a vault. dbx handles the SSH tunnel, pulls credentials from your keychain, runs the dump through Docker, compresses with zstd, encrypts with age, and writes a verified backup with checksums. One command.

What it supports

Databases: PostgreSQL and MySQL/MariaDB. Postgres uses pg_dump custom format. MySQL uses a two-pass strategy: schema first (with routines, triggers, events), then data separately. This lets you exclude specific tables from the data dump while preserving their schema, which is useful for skipping session tables or audit logs that would bloat the backup.

Compression: zstd with multi-threaded mode (-T0). Configurable compression level. Fast on modern hardware, significantly better ratios than gzip on database dumps.

Encryption: Two backends. Age (asymmetric, recommended) or GPG (symmetric, legacy). The encryption pipeline is streaming: pg_dump | zstd | age > file. No intermediate unencrypted files on disk. The decompress function handles every combination of compression and encryption by matching file extensions, with magic byte detection as a fallback.

SSH Tunnels: Automatic creation, reuse detection (won't create a duplicate if one already exists to the same target), Docker bridge-aware host resolution (host.docker.internal on macOS, 172.17.0.1 on Linux), and cleanup on exit. Only kills tunnels it created, not pre-existing ones.

Credential storage: Four vault backends with auto-detection: macOS Keychain, libsecret (Linux desktop), pass (headless Linux), and GPG-encrypted file as fallback. Passwords can also come from external commands like the 1Password CLI. Plaintext passwords in config are supported but produce a warning.

Cloud storage: S3-compatible uploads via MinIO Client or AWS CLI. Metadata sidecars (.meta.json) are uploaded alongside each backup with checksums, timestamps, and encryption info.

Scheduled backups: Platform-native scheduling. launchd plists on macOS, systemd timers on Linux. Supports human-readable schedules (daily@3, weekly@mon:6) and raw cron expressions.

Notifications: Slack webhooks, desktop notifications (terminal-notifier/notify-send), SMTP email, or arbitrary shell commands. Configurable to fire on failure only, success only, or both.

Some technical decisions worth explaining

Why Bash

The primary consumers of dbx are developers and ops people who already have Bash. No runtime to install, no package manager, no virtual environment. curl | bash and it works. The trade-off is obvious: Bash is terrible for complex data structures, error handling is manual, and testing is primitive. But for a tool that orchestrates other CLI tools (docker, ssh, jq, zstd, age), Bash is the natural glue language. The entire project is about 5,500 lines across 10 source files.

MySQL's two-pass dump

MySQL's mysqldump doesn't support excluding table data while keeping the schema in a single invocation. So dbx runs two passes: first with --no-data --routines --triggers --events to get the full schema, then with --no-create-info --skip-triggers for just the data rows, minus excluded tables. The two outputs are concatenated and piped through compression. There's also a DEFINER stripping pass that removes or replaces MySQL DEFINER clauses, which would otherwise cause permission errors when restoring to a local container where the original user doesn't exist.

Restore naming

When you restore a backup, dbx creates a new database inside the local Docker container with an auto-generated name: <database>_v<N>_<YYYYMMDD>. It increments the version number until there's no collision. This means you can restore multiple snapshots of the same database side by side for comparison. You can override with --name if you want a specific name.

Verification

Every backup produces a .meta.json sidecar containing a SHA-256 checksum, the dbx version that created it, encryption type, host, database name, and timestamp. The verify command recomputes the checksum and compares. If the metadata is missing (e.g., a backup from an older version), it falls back to a readability check: decompress a chunk and confirm it looks like SQL.

The TUI

There's an interactive mode built with gum (Charmbracelet) that shows a dashboard of configured hosts, recent backups, storage usage, and encryption status. It's entirely optional. The CLI works fine without it. But for developers who prefer menus over memorizing flags, it makes the tool more approachable.

Configuration

Everything lives in ~/.config/dbx/config.json. Per-host configuration defines the database type, connection details, SSH tunnel settings, and per-database exclusions. Global defaults cover compression level, encryption type, retention count, and notification preferences. Environment variables (DBX_*) override everything for CI/CD use.

A typical host entry looks like:

{
  "type": "postgres",
  "host": "db.internal",
  "port": 5432,
  "user": "backup_user",
  "ssh_tunnel": {
    "jump_host": "bastion.example.com",
    "target_host": "db.internal",
    "target_port": 5432
  },
  "databases": {
    "production": {
      "exclude_data": ["sessions", "audit_log", "cache_entries"],
      "parallel_jobs": 4
    }
  }
}

The backup file layout

~/.data/dbx/
  <host>/
    <database>/
      production_20260315_143000.sql.zst.age
      production_20260315_143000.sql.zst.age.meta.json
      production_20260314_030000.sql.zst.age
      production_20260314_030000.sql.zst.age.meta.json

The file extension tells you exactly what happened: .sql.zst.age means SQL dump, zstd compressed, age encrypted. The decompress function parses this chain in reverse to restore. It handles every permutation: .sql.zst, .sql.gz, .sql.zst.age, .sql.zst.gpg, .sql.gz.age, and so on.

What it doesn't do

dbx is for development and operations workflows. It's not a production backup system. It doesn't do point-in-time recovery, WAL archiving, incremental backups, or replication. If you need those, use pgBackRest or WAL-G for Postgres, or native replication for MySQL.

What dbx is good at: getting a snapshot of a remote production database onto your local machine, encrypted and verified, without installing database client tools, and being able to restore it into a disposable Docker container for debugging, testing, or data analysis. That's a workflow that most teams do manually with a collection of shell scripts and tribal knowledge. This is the organized version.

Status

Version 0.6.0. MIT licensed. CI runs ShellCheck and syntax validation. All recent code review findings have been applied. It's been my daily driver for database backup workflows, and it's at the point where I trust it with production data.

The source is on GitHub: github.com/toms-io/dbx