Backup

Native systems are harder to backup than virtual systems, thats just the way it goes. With VMs, your hypervisor usually has a builtin backup feature which pauses the machine, makes a snapshot, resumes the machine, and continues backing up the snapshot in the background. If you have the luxury of running your server as a VM (e.g., Proxmox PVE), it makes backups so much easier.

However in this book we are dealing with a physical Raspberry Pi, not a virtual machine, so backup must be managed another way.

Restic

Restic is a modern open source backup tool, which has many great features including incremental backups, encryption, and offsite upload. It can maintain backups of huge sizes.

You can install restic to backup any directory on a Linux host. One of the best ways to do that is with the script found on this blog post:

Daily backups to S3 with Restic and systemd timers

The only problem with backing up files this way is that with containers that are always running, you need to make sure that the files are flushed to disk before the backup starts, otherwise your backup could become corrupted. For most media storage that doesn’t change that often (photos, videos, etc.) this might not be such a big deal, but for databases it’s a problem.

Backup-Volume

Backup-Volume is another backup tool that is specifically configured to backup Docker volumes and uploading archives to offsite storage (S3, SSH, DropBox). This tool is much more simplistic compared to Restic, with the most important difference being that Backup-Volume can only handle complete backups (no incremental storage). For small datasets this is ideal, because each backup gets stored in a separate backup-XXXX.tar.gz, and its easy to restore with one file. For larger datasets, the duplication of backup files would be prohibitively expensive/wasteful (although you can tune the retention and pruning parameters to save some space, it won’t compare to the efficiency of restic).

Backup-Volume has a trick it can use in its favor: it can automatically stop and start containers before and after the backup runs. This makes this style of backup much safer for write intensive volumes (e.g., databases) and ensures that the data gets flushed before the backup starts.

You will have to analyze your own situation and weigh the cost of data integrity vs. the cost of data duplication, to help decide which kind of backup to deploy. A future version of Backup-Volume may integrate Restic to make this choice a non-issue.

Setup Backup-Volume

Prepare an S3 bucket offsite

You may want to use your own minio S3 service (preferably installed on a separate offsite server), or a third party provider (AWS S3, DigitalOcean Spaces, Wasabi, etc.)

You will need to provide the S3 bucket and credentials that the backup process will use when uploading archives:

  • S3 Endpoint domain. e.g., s3.example.com.
  • S3 bucket name. e.g., test
  • S3 access key id. e.g., test.
  • S3 secret key. e.g., xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.

Configure Backup-Volume

Run this on your Raspberry Pi
## Configures the default backup-volume instance:
pi make backup-volume config

Select multiple existing volumes to backup together as one archive:

(stdout)
? Select all the volumes to backup
> [x] test1_data
  [ ] forgejo_data
  [x] icecast_config
  [ ] icecast_logs
  [ ] mosquitto_mosquitto
  [ ] traefik_geoip_database
v [ ] traefik_traefik

Choose the backup schedule in cron format :

(stdout)
BACKUP_CRON_EXPRESSION: Enter the cron expression (eg. @daily)

: @every 24h
Tip

Other example schedules:

Choose the retention length (number of days) to keep backup archives before automatic pruning happens:

(stdout)
BACKUP_RETENTION_DAYS: Rotate backups older than how many days? (eg. 30)

: 30

You can choose any of the supported storage mechanisms. For demo purposes, choose S3:

(stdout)
> Which remote storage do you want to use? s3

BACKUP_AWS_ENDPOINT: Enter the S3 endpoint (e.g., s3.example.com)

: s3.d.example.com

BACKUP_AWS_S3_BUCKET_NAME: Enter the S3 bucket name (e.g., my-bucket)

: backup-test-1

BACKUP_AWS_ACCESS_KEY_ID: Enter the S3 access key id (e.g., my-access-key)

: backup-test-1

BACKUP_AWS_SECRET_ACCESS_KEY: Enter the S3 secret access key

: OEuL3lMSdvdoFyVjEQTM4Trj/7VhHq7Q7cOFEpQPuxMHxsTVK3Hxne7st6Ty

BACKUP_AWS_S3_PATH: Choose a directory inside the bucket (blank for root)

:
Tip

You should use a dedicated bucket for each backup instance, or you can share the same bucket between several instances, as long as you are careful to configure a unique BACKUP_AWS_S3_PATH bucket sub-directory for each instance.

You may optionally preserve an additional copy of the archive in a local volume:

(stdout)
> Do you want to keep a local backup in addition to the remote one? No

Install

Run this on your Raspberry Pi
## installs the default backup instance:
pi make backup-volume install

Instances

All volume selections will backup to the same archive on the same schedule. To back up different volumes on different schedules, you should create more than one instance of Backup-Volume to create separate configs:

Run this on your Raspberry Pi
## Creates a new backup instance named test:
pi make backup-volume instance instance=test
pi make backup-volume install instance=test

Verify backup schedule

Run this on your Raspberry Pi
pi make backup-volume logs
(stdout)
backup-1  | 2024-10-16T02:37:00.263838944Z time=2024-10-16T02:37:00.262Z level=INFO msg="Successfully scheduled backup from environment with expression @daily"
backup-1  | 2024-10-16T02:37:00.266773318Z time=2024-10-16T02:37:00.266Z level=INFO msg="The backup will start at 12:00 AM"
Tip

You should see a plain text log message describing when the backup will occur (The backup will start at 12:00 AM), except it will be ommitted if you use the @every syntax.

Restore

To restore a volume from a backup, simply untar the archive into the appropriate directory under /var/lib/docker/volumes.

Notifications

TODO