Backing up Linux with Duply and Duplicity
Backing up your servers critical data is extremly critical in a cloud environment where cloud servers are considered ephermal.
In this tutorial we will look at utilizing Object Storage in combination with open source projects to add encrypted, trusted and unlimited backup to your application platform.
Duplicity is a great backup tool written in Python that supports a wide array of remote storage systems.
Duply is a frontend for Duplicity that simplifies the process of automating backups.
The target system for this tutorial is Ubuntu Server 14.04, so all instructions here might not apply to your distribution. Duplicity should work on most Unix-like systems, and has been known to also work under Windows.
Installation
First we need to install all required software and dependencies.
Warning
These instructions add third-party package repositories to your servers, third-party binaries without packaging and use PIP to install Python packages.
Duplicity
Support for Swift with Identity v3 was just recently added to Duplicity in version 0.7.04. To get the latest version, we add the Duplicity Team Stable PPA.
user@server$ sudo add-apt-repository ppa:duplicity-team/ppa
user@server$ sudo apt-get update
user@server$ sudo apt-get install duplicity
Duply
We need an up to date version of Duply (>=1.10.1). Since Duply only consists of one binary we opt for the easy solutions of installing the development version directly:
user@server$ wget http://duply.net/tmp/duply.sh -O duply
user@server$ chmod 755 duply
user@server$ sudo chown root:root duply
user@server$ sudo mv duply /usr/local/bin/
Openstack Swift
We also need recent copies of various Openstack Python libraries on the system. Since these are not generally available via a PPA, we opt for installing them directly using PIP:
user@server$ sudo apt-get install python-pip python-dev libffi-dev libssl-dev
user@server$ sudo pip install 'requests[security]' python-swiftclient python-keystoneclient
Encryption and Signing
Duplicity supports the use of PGP keys for encryption and signing of backups. We encrypt all backups to two sets of PGP keys, one key for the server itself, and one centralized key for emergencies. This centralized key can be used to decrypt all backups, while the local server key can only decrypt its own backups.
First we create our centralized key. This should only be created once, preferably on your local workstation. Remember to set a secure passphrase.
Use "RSA and RSA" as Type, "2048" bits and no expiration ("0"). Set "Real Name" to "Master Backup Key", "Email" to "support@test.com" (or similar) and "Comment" to your domain, here we use "test.com".
user@workstation$ gpg --gen-key
<snip>
pub 2048R/545DDA1E 2015-08-19
Key fingerprint = BBF4 1F21 1FED 8FAF 0D15 A02C A19C D056 545D DA1F
uid Master Backup Key (test.com) <support@test.com>
sub 2048R/A1A13C72 2015-08-19
We've now created a master keypair. The ID is "545DDA1E".
You should store a copy of the private key somewhere safe, preferably on multiple secure locations. To export it, use the following command:
workstation$ gpg --armor --export-secret-key -a MASTER_KEY_ID
The pubic key needs to be added to the server we are backing up. Export it from the workstations trustdb:
user@workstation$ gpg --armor --export -a MASTER_KEY_ID
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1
mQENBFXUcPoBCADQaQPbL5PASdg/6v5f4te8E81gt2KFs2OLAbJzWK1pEXCwpXRR
2xgsMnHaTfr4XWc5XX4LkkaO7oJ8DDTHrY50GJPPzZei5BpKxPA1R023b+OS1Z9A
<snip>
9r9ceM+qEB1eUAL0jG7nwVktbOlio6QYLjTRD4lVt3YXk7X961PdaPG9pNapI76y
iI4Tzi1xM2WSQNl2p76wL6j5D+hdcnE/29VjnP+ThyiZJEV6baRLVxIkkgMaxQ==
=2Ghe
-----END PGP PUBLIC KEY BLOCK-----
Copy the output into your clipboard, and add it to your server:
user@server$ sudo -i gpg --import
gpg: keyring `/root/.gnupg/secring.gpg' created
Paste the public key into the window, followed by Control-D.
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1
mQENBFXUcPoBCADQaQPbL5PASdg/6v5f4te8E81gt2KFs2OLAbJzWK1pEXCwpXRR
2xgsMnHaTfr4XWc5XX4LkkaO7oJ8DDTHrY50GJPPzZei5BpKxPA1R023b+OS1Z9A
<snip>
9r9ceM+qEB1eUAL0jG7nwVktbOlio6QYLjTRD4lVt3YXk7X961PdaPG9pNapI76y
iI4Tzi1xM2WSQNl2p76wL6j5D+hdcnE/29VjnP+ThyiZJEV6baRLVxIkkgMaxQ==
=2Ghe
-----END PGP PUBLIC KEY BLOCK-----
gpg: key 545DDA1E: public key "Master Backup Key (test.com) <support@test.com>" imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
Create a keypair for the server. This is done on the server itself. Since the server is headless and virtual, it might not have enough entropy to generate a keypair in a timely manner. We can work around this by installing "haveged" first:
user@server$ sudo apt-get install haveged
Create the keypair. Again we use "RSA and RSA" as type, "2048" bits and no expiration ("0"). We set the "Real Name" to "Host Backup Key", Email to "support@test.com" and Comment to the FQDN of the server, "server.test.com".
user@server$ sudo -i gpg --gen-key
<snip>
pub 2048R/B42A8F0E 2015-08-19
Key fingerprint = BEB5 3FC7 363C 2DE7 6BF7 900A 2319 5DC5 B42A 8F0E
uid Host Backup Key (server.test.com) <support@test.com>
sub 2048R/C6A9286C 2015-08-19
</support@test.com></support@test.com></heinrichh@duesseldorf.de></n></n></n></n>
Now sign the Master Key using the Host Key:
user@server$ sudo -i gpg -u HOST_KEY_ID --sign-key MASTER_KEY_ID
Configuring Duply
Create a directory for system-wide profiles and a directory for logs:
user@server$ sudo mkdir -p /etc/duply /var/log/duply
Create a duply profile. We use the hostname as the profile name, as it is a system-wide profile.
user@server$ sudo -i duply $HOSTNAME create
Edit the default profile configuration using your preferred editor,
user@server$ sudo editor /etc/duply/$HOSTNAME/conf
You should end up with a configuration file looking something like this:
# GPG
GPG_KEYS_ENC='HOST_KEY_ID,MASTER_KEY_ID'
GPG_KEY_SIGN='HOST_KEY_ID'
GPG_PW='PASSPHRASE'
# Base directory to backup
SOURCE='/'
# Target for backup
TARGET='swift:///backup-server.test.com'
# Openstack Swift
export SWIFT_AUTHURL='https://identity.api.zetta.io/v3'
export SWIFT_USERNAME='OPENSTACK_USERNAME'
export SWIFT_PASSWORD='OPENSTACK_PASSWORD'
export SWIFT_REGION_NAME='no-osl1'
export SWIFT_USER_DOMAIN_NAME='test.com'
export SWIFT_PROJECT_DOMAIN_NAME='test.com'
export SWIFT_TENANTNAME='Standard'
export SWIFT_AUTHVERSION='3'
# Openstack Swift Object Size
VOLSIZE=100
DUPL_PARAMS="$DUPL_PARAMS --volsize $VOLSIZE "
# Backup retention
MAX_AGE=2M
MAX_FULL_BACKUPS=2
MAX_FULLS_WITH_INCRS=1
MAX_FULLBKP_AGE=1M
DUPL_PARAMS="$DUPL_PARAMS --full-if-older-than $MAX_FULLBKP_AGE "
Let's go over some of the options:
GPG_KEYS_ENC
is a comma-separated list of public key IDs to use as target for encryption. We use the KEY ID of the server and the centralized master key.
GPG_KEY_SIGN
is the key we use to sign the backup, to verify that files has not been modified. We use the KEY ID of the server.
GPG_PW
is the passphrase for the hosts private key.
SOURCE
is the base path for your backup. Here we have set it to the root of the filesystem.
TARGET
is the destination for the backup. "swift://" indicates that we want to use the Openstack Swift backend, and the remainder is the name of the container. You should use a separate container per backup profile. Here we use a container name consisting of the prefix "backup-" followed by the FQDN wich is "server.test.com" in our case. If you use more than one policy per server, you can suffix it with the profile name.
SWIFT_AUTHURL
is the URL for the Zetta.IO Keystone Identity server.
SWIFT_USERNAME
, SWIFT_PASSWORD
and SWIFT_USER_DOMAIN_NAME
is the credentials you use to authenticate to Openstack.
SWIFT_TENANTNAME
is the name of the project/tenant you want to scope to. At Zetta.IO the default project name is "Standard".
SWIFT_PROJECT_DOMAIN_NAME
is usually the same as SWIFT_USER_DOMAIN_NAME.
MAX_AGE
is how far back in time you want backups to be kept. Here we use "2M" wich is two months.
MAX_FULL_BACKUPS
is how many full backups we want to keep. Here we use "2" to have two full backup sets.
MAX_FULLS_WITH_INCRS
is how many full backups with incremental backups we want to keep. Here we use "1" to indicate that we only want incremental granularity for the last full backup.
MAX_FULLBKP_AGE
is how long we should to incremental backups before we need to do a full backup. Here we use "1M" to indicate that we take a new full backup every month.
Please refer to the Duply documentation for more information about these options.
Since SOURCE is set to '/' we need to set up an exclude-file to limit what we back up:
user@server$ sudo editor /etc/duply/$HOSTNAME/exclude
Although the file is called "exclude", it is actually a globbing file list. You should end up with a configuration file looking something like this:
+ /etc
+ /root
+ /var/mail
+ /var/spool/cron
+ /var/www
+ /opt
+ /srv
+ /home
**
This file will INCLUDE (prefixed with +) the directories /etc, /root, /var/mail, /var/spool/cron, /var/www, /opt, /srv and /home in the backup.
It will EXCLUDE (**) everything else.
Modify it so that it suits your setup.
Testing
Now we should be able to do our first full backup:
user@server$ sudo -i duply $HOSTNAME backup
You should now get an output similar to this, without any errors:
Start duply v1.10.1, time is 2015-08-19 18:29:04.
Using profile '/etc/duply/server'.
Using installed duplicity version 0.7.04, python 2.7.6, gpg 1.4.16 (Home: ~/.gnupg), awk 'GNU Awk 4.0.1', grep 'grep (GNU grep) 2.16', bash '4.3.11(1)-release (x86_64-pc-linux-gnu)'.
Checking TEMP_DIR '/tmp' is a folder (OK)
Checking TEMP_DIR '/tmp' is writable (OK)
Test - Encrypt to 'B42A8F0E','545DDA1E' & Sign with 'B42A8F0E' (OK)
Test - Decrypt (OK)
Test - Compare (OK)
Cleanup - Delete '/tmp/duply.23938.1440001744_*'(OK)
--- Start running command PRE at 18:29:04.619 ---
Skipping n/a script '/etc/duply/server/pre'.
--- Finished state OK at 18:29:04.635 - Runtime 00:00:00.016 ---
--- Start running command BKP at 18:29:04.649 ---
Reading globbing filelist /etc/duply/server/exclude
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
Last full backup is too old, forcing full backup
Reuse configured PASSPHRASE as SIGN_PASSPHRASE
--------------[ Backup Statistics ]--------------
StartTime 1440001745.84 (Wed Aug 19 18:29:05 2015)
EndTime 1440001866.41 (Wed Aug 19 18:31:06 2015)
ElapsedTime 120.56 (2 minutes)
SourceFiles 33875
SourceFileSize 939478826 (896 MB)
NewFiles 33875
NewFileSize 939478826 (896 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 33875
RawDeltaSize 916639093 (874 MB)
TotalDestinationSizeChange 242336109 (231 MB)
Errors 0
-------------------------------------------------
--- Finished state OK at 18:31:11.600 - Runtime 00:02:06.950 ---
--- Start running command POST at 18:31:11.615 ---
Skipping n/a script '/etc/duply/server/post'.
--- Finished state OK at 18:31:11.634 - Runtime 00:00:00.018 ---
You can now check the status of the backup:
user@server:~$ sudo -i duply $HOSTNAME status
Start duply v1.10.1, time is 2015-08-19 18:44:32.
Using profile '/etc/duply/server'.
Using installed duplicity version 0.7.04, python 2.7.6, gpg 1.4.16 (Home: ~/.gnupg), awk 'GNU Awk 4.0.1', grep 'grep (GNU grep) 2.16', bash '4.3.11(1)-release (x86_64-pc-linux-gnu)'.
Checking TEMP_DIR '/tmp' is a folder (OK)
Checking TEMP_DIR '/tmp' is writable (OK)
Test - Encrypt to 'B42A8F0E','545DDA1E' & Sign with 'B42A8F0E' (OK)
Test - Decrypt (OK)
Test - Compare (OK)
Cleanup - Delete '/tmp/duply.24554.1440002672_*'(OK)
--- Start running command STATUS at 18:44:32.575 ---
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Wed Aug 19 18:29:05 2015
Collection Status
-----------------
Connecting with backend: BackendWrapper
Archive dir: /root/.cache/duplicity/duply_server
Found 0 secondary backup chains.
Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Wed Aug 19 18:29:05 2015
Chain end time: Wed Aug 19 18:29:05 2015
Number of contained backup sets: 1
Total number of contained volumes: 3
Type of backup set: Time: Num volumes:
Full Wed Aug 19 18:29:05 2015 3
-------------------------
No orphaned or incomplete backup sets found.
--- Finished state OK at 18:44:34.249 - Runtime 00:00:01.674 ---
Now would be a good time to copy your duply profiles folder to a remote secure location:
user@server$ sudo tar zcvf duply-server.test.com.tgz /etc/duply
user@server$ scp duply-server.test.com.tgz user@secure.test.com:
This archive include your servers PGP keys and all settings required for a full restore.
Automating the backup job
Now that we know that the backup profile works, we set it up to run daily. Add the following to your existing /etc/crontab or create a new /etc/cron.d/duply file:
user@server$ sudo editor /etc/cron.d/duply
MAILTO=support@test.com
SHELL=/bin/bash
PATH=/bin:/usr/bin:/usr/local/bin
0 1 * * * root : Duply Backup Error ; P=server ; LOGFILE="/var/log/duply/${P}.$(date +\%Y\%m\%d).log"; duply ${P} backup 1>>"$LOGFILE" 2>&1 || cat "$LOGFILE"
Be sure to change "P=server" to "P=PROFILENAME" above. The profile name is your hostname.
In case Duply/Duplicity exists with an error, the logfile will be sent by email to whoever is set as recipient in MAILTO. This requires that system mail delivery is working, using either a local MTA or something like nullmailer.
Deleting old backups
Old backup sets are not deleted automatically. To delete these and free up space, use the "purge" command to list outdated backup sets, and the "purge --force" command to delete them.
Info
We recentely started a project creating a service written in Golang what configures, schedules and runs duply backup and purge. Find the project on Github. Feedback and contributions are welcome.
Future
This is by no means a complete backup solution yet. Things to look into to expand on the setup could be:
- Databases
- Restores will probably fail. MySQL, PostgreSQL, Zookeeper, Cassandra etc should first be backed up to disk using the vendor supplied tools in a PRE-script before the file backup is run.
- Security
- If you authenticate to Swift with an Openstack user with full privileges, a comprimised server could ultimately leak your full credentials. Setting up a separate account/project/container ACL might be a good idea and needs to be investigated.
- Large static datasents
- If you have large amounts of data that seldom changes, it might be a good idea to not rotate the full backup every month, but instead do incremental backups forever. This will increase the recovery time, but will prevent a complete data dump every month. This can be achieved by using multiple profiles.
- Configuration Management
- Automatic deployment of Duply and Duplicity using configuration management tools like SaltStack, Puppet, Ansible and Chef.
If you have any feedback or tips to share, be free to post in our forum.