Automatic Website Backups with Tar, Cron, and Rsync on Ubuntu 20.04
Introduction
This guide describes how to create automated backups of a web server's document root and Apache configuration files using the utilities tar
, rsync
, and cron
. The example schedule runs daily, with backup retention for one year.
Prerequisites
- A fully-updated Ubuntu Linux 20.04 server, running Apache.
- (Referred to as the "web server")
- A fully-updated Ubuntu Linux 20.04 server for offsite backups.
- (Referred to as the "offsite server")
- A non-root sudo user and SSH access to both servers.
You can adapt this guide for other web servers and Linux distributions with slight modifications. The commands used are standard for most Linux distros.
The 3-2-1 Rule
The 3-2-1 backup rule states that you need three copies of your data. In addition to the primary working copy, you need two backups, stored on different types of storage media, with one of the copies offsite. With this system, you're highly unlikely to lose your data.
1. Backup Preparation
Before creating a backup, ensure you have sufficient disk space on the web server.
$ df -h
Compare this to the size of your website root and Apache configuration files and any other directories you wish to back up:
$ du -sh /var/www
$ du -sh /etc/apache2
Once you have confirmed enough space is available to perform the backup, create a directory to contain the local backup.
$ sudo mkdir /var/web_backup
2. Perform a Backup
Create a backup of the web server's document root and Apache configuration directory with tar
. Name this file web-server.tar.gz
.
$ sudo tar -cvzpf /var/web_backup/web-server.tar.gz -C /var/ www -C /etc/ apache2
The arguments (-cvpzf -C
) in the command specify the following:
c
- Create archivev
- Verbosez
- Compress the file usinggzip
p
- Preserve permissions of the filesf
- Path and filename of the archiveC
- Change directory
Note the space after /var/
and /etc/
in the command before specifying the directory's name to back up. Include any other directories you wish to back up with additional -C
arguments in the command.
How to exclude files
To exclude a file or directory from your archive, use --exclude=path/to/file/
. For example, if you want to exclude a directory /var/www/example.com/junk/
, use:
$ sudo tar -cvpzf /var/web_backup/web-server.tar.gz --exclude=www/example.com/junk -C /var/ www -C /etc/ apache2
Note: Don't include the full path when specifying what to exclude, use the relative path. For instance, use
www/example.com/junk
rather than/var/www/example.com/junk
. Use multiple--exclude=path/to/file/
directives when excluding multiple files and directories. You may also use an exclude file if you have many directories to exclude.
3. Test a Restore
Before automating the process, make sure you can restore the files. First, create a test directory for the restore.
$ sudo mkdir /var/web_backup/restored/
Restore the archive to the directory.
$ sudo tar -xvzpf /var/web_backup/web-server.tar.gz -C /var/web_backup/restored/
The extract command has some new arguments:
x
- Extract the archivez
- Decompress the archive
Verify the restore was successful by listing the contents.
$ ls -lh /var/web_backup/restored/
Once verified, remove the test directory:
$ sudo rm -r /var/web_backup/restored/
4. Automate with Cron
The cron
utility schedules commands to run at specific times. This example will schedule daily backups at 02:00 and keep one backup for each day of the week, and use the date
command to name the archive with the day of the week it was created.
$ sudo tar -cvpzf /var/web_backup/web-server.`date +\%a`.tar.gz -C /var/ www -C /etc/ apache2
The +\%a
variable specifies the date should be returned as an abbreviated day of the week. For example, "Mon", Tue", etc. The tar
command will overwrite the daily archive from the previous week.
For a full list of format characters supported by the
date
command, refer to the man pages.
Add a job to crontab
The cron
schedules are stored in a file named crontab. The typical crontab entry begins with five values (or asterisks), followed by a command. The values tell cron
when to execute the command, and an asterisk means "all".
* * * * * command to be executed
- - - - -
| | | | |
| | | | ----- Day of week (0 - 7) (Sunday=0 or 7)
| | | ------- Month (1 - 12)
| | --------- Day of month (1 - 31)
| ----------- Hour (0 - 23)
------------- Minute (0 - 59)
To edit crontab, run:
$ sudo crontab -e
Add the tar
command to the bottom of the file, including the schedule at the beginning:
00 02 * * * sudo tar -cvpzf /var/web_backup/web-server.`date +\%a`.tar.gz -C /var/ www -C /etc/ apache2
This schedules cron
to run the command at 02:00, every day of any month.
12 month retention
To keep monthly backups, use another schedule that appends the filename with an abbreviated month name using the +\%b
parameter to the command. To test date
commands before inserting them into the filename, use echo
to print them:
$ echo `date +\%b`
Add this to a new line in crontab
:
0 0 1 * * sudo tar -cvpzf /var/web_backup/web-server.`date +\%b`.tar.gz -C /var/ www -C /etc/ apache2
This schedules cron
to run the monthly backup at 00:00, on the first day of the month. After a year, older archives are overwritten.
5. Back up Offsite
To complete the 3-2-1 backup rule, the offsite server will use rsync
, to download the backups from the web server. Ensure both the web server and the offsite server have rsync
installed. It is installed by default on Ubuntu systems, and can be manually installed with:
$ sudo apt install rsync
Log into your offsite server and create a directory to store the backups:
$ mkdir server_backup
Run
rsync
to download the backups from the web server. Include a trailing slash after the remote directory to only transfer the contents of the directory.$ rsync -azP --delete username@remote_host:/var/web_backup/ /path/to/offsite/server_backup
rsync
takes a number of arguments:a
- Archive, sync recursively while preserving symbolic links, file permissions, etcz
- Compress file data during transferP
- Short for--partial
and--progress
, keep partially transferred files if interrupted and display the progress of the transfer--delete
- Delete extraneous files from the destination directory
Use caution with
--delete
. Test your command with the--dry-run
option first to prevent data loss.
Excluding files
You may exclude files from rsync
. To exclude a file from being transferred, use the --exclude=relative/path/to/file/
argument in your command. For example, if you want to exclude Wednesday's backup from being transferred:
$ rsync -azP --delete --exclude=web-server.Wed.tar.gz username@remote_host:/var/web_backup/ /path/to/offsite/server_backup
Repeat --exclude=relative/path/to/file/
for each file or directory you wish to exclude.
Note: Don't include the full path when specifying what to exclude, exclude paths need to be relative to the source path. For instance, to exclude Wednesday's backup stored in
/var/web_backup/daily/
, usedaily/web-server.Wed.tar.gz
, not/var/web_backup/daily/web-server.Wed.tar.gz
.
6. Passwordless Rsync Login
To automate the process, configure rsync
to connect to the web server from the offsite server without a password.
On the offsite server, create a key pair for
rsync
:$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/your_home/.ssh/id_rsa):
Name the key
id_rsa_rsync
and do not enter a passphrase.Add the new public key to the web server.
$ ssh-copy-id -i ~/.ssh/id_rsa_rsync username@remote_host
The offsite server should be able download files from the web server without a password. On the offsite server, test the command.
$ rsync -azP --delete -e 'ssh -i ~/.ssh/id_rsa_rsync' username@remote_host:/var/web_backup/ /path/to/offsite/server_backup
If someone gains access to the offsite server, the key pair will grant them login access to the web server. It's possible to limit the key and prevent remote login.
Find the exact command being executed on the web server with
grep
by including the verbose-v
flag tossh
.$ rsync -azP --delete -e 'ssh -vi ~/.ssh/id_rsa_rsync' username@remote_host:/var/web_backup/ /path/to/offsite/server_backup 2>&1 | grep "Sending command" Output: debug1: Sending command: rsync --server --sender -vlogDtprz . /var/web_backup/
SSH to the web server, and edit the
authorized_keys
file:$ sudo nano ~/.ssh/authorized_keys
Find the public key you added earlier. For example:
ssh-rsa AAAAB3Nza...[many characters]...LiPk== user@example.com
Insert a command directive before ssh-rsa, using the command from step 1. The line will look like this, for example:
command="rsync --server --sender -vlogDtprz . /var/web_backup/",no-pty,no-agent-forwarding,no-port-forwarding ssh-rsa AAAAB3Nza...[many characters]...LiPk== user@example.com
This directive limits the key to one command, reducing the risk a malicious user could gain access to the web server.
Test the
rsync
command using the key:$ rsync -azP --delete -e 'ssh -i ~/.ssh/id_rsa_rsync' username@remote_host:/var/web_backup/ /path/to/offsite/server_backup
Attempt to log in with the key and verify access is denied for interactive login.
$ ssh -i ~/.ssh/id_rsa_rsync username@remote_host
7. Automate the Process
Open crontab
on the offsite server.
$ sudo crontab -e
Append the rsync
command to the bottom of the file.
00 03 * * * rsync -azP --delete -e 'ssh -i ~/.ssh/id_rsa_rsync' username@remote_host:/var/web_backup/ /path/to/offsite/server_backup
This schedules cron
to run the rsync command at 03:00 each day.
Conclusion
You have a working 3-2-1 backup strategy! Your website's automatic backups are made daily and sent to an offsite location with retention for one year.