Automatic Website Backups with Tar, Cron, and Rsync on Ubuntu 20.04

Updated on November 21, 2023
Automatic Website Backups with Tar, Cron, and Rsync on Ubuntu 20.04 header image

Introduction

This guide describes how to create automated backups of a web server's document root and Apache configuration files using the utilities tar, rsync, and cron. The example schedule runs daily, with backup retention for one year.

Prerequisites

You can adapt this guide for other web servers and Linux distributions with slight modifications. The commands used are standard for most Linux distros.

The 3-2-1 Rule

The 3-2-1 backup rule states that you need three copies of your data. In addition to the primary working copy, you need two backups, stored on different types of storage media, with one of the copies offsite. With this system, you're highly unlikely to lose your data.

1. Backup Preparation

Before creating a backup, ensure you have sufficient disk space on the web server.

$ df -h

Compare this to the size of your website root and Apache configuration files and any other directories you wish to back up:

$ du -sh /var/www
$ du -sh /etc/apache2

Once you have confirmed enough space is available to perform the backup, create a directory to contain the local backup.

$ sudo mkdir /var/web_backup

2. Perform a Backup

Create a backup of the web server's document root and Apache configuration directory with tar. Name this file web-server.tar.gz.

$ sudo tar -cvzpf /var/web_backup/web-server.tar.gz -C /var/ www -C /etc/ apache2

The arguments (-cvpzf -C) in the command specify the following:

  • c - Create archive
  • v - Verbose
  • z - Compress the file using gzip
  • p - Preserve permissions of the files
  • f - Path and filename of the archive
  • C - Change directory

Note the space after /var/ and /etc/ in the command before specifying the directory's name to back up. Include any other directories you wish to back up with additional -C arguments in the command.

How to exclude files

To exclude a file or directory from your archive, use --exclude=path/to/file/. For example, if you want to exclude a directory /var/www/example.com/junk/, use:

$ sudo tar -cvpzf /var/web_backup/web-server.tar.gz --exclude=www/example.com/junk -C /var/ www -C /etc/ apache2

Note: Don't include the full path when specifying what to exclude, use the relative path. For instance, use www/example.com/junk rather than /var/www/example.com/junk. Use multiple --exclude=path/to/file/ directives when excluding multiple files and directories. You may also use an exclude file if you have many directories to exclude.

3. Test a Restore

Before automating the process, make sure you can restore the files. First, create a test directory for the restore.

$ sudo mkdir /var/web_backup/restored/

Restore the archive to the directory.

$ sudo tar -xvzpf /var/web_backup/web-server.tar.gz -C /var/web_backup/restored/

The extract command has some new arguments:

  • x - Extract the archive
  • z - Decompress the archive

Verify the restore was successful by listing the contents.

$ ls -lh /var/web_backup/restored/

Once verified, remove the test directory:

$ sudo rm -r /var/web_backup/restored/

4. Automate with Cron

The cron utility schedules commands to run at specific times. This example will schedule daily backups at 02:00 and keep one backup for each day of the week, and use the date command to name the archive with the day of the week it was created.

$ sudo tar -cvpzf /var/web_backup/web-server.`date +\%a`.tar.gz -C /var/ www -C /etc/ apache2

The +\%a variable specifies the date should be returned as an abbreviated day of the week. For example, "Mon", Tue", etc. The tar command will overwrite the daily archive from the previous week.

For a full list of format characters supported by the date command, refer to the man pages.

Add a job to crontab

The cron schedules are stored in a file named crontab. The typical crontab entry begins with five values (or asterisks), followed by a command. The values tell cron when to execute the command, and an asterisk means "all".

* * * * * command to be executed
- - - - -
| | | | |
| | | | ----- Day of week (0 - 7) (Sunday=0 or 7)
| | | ------- Month (1 - 12)
| | --------- Day of month (1 - 31)
| ----------- Hour (0 - 23)
------------- Minute (0 - 59)

To edit crontab, run:

$ sudo crontab -e

Add the tar command to the bottom of the file, including the schedule at the beginning:

00 02 * * * sudo tar -cvpzf /var/web_backup/web-server.`date +\%a`.tar.gz -C /var/ www -C /etc/ apache2

This schedules cron to run the command at 02:00, every day of any month.

12 month retention

To keep monthly backups, use another schedule that appends the filename with an abbreviated month name using the +\%b parameter to the command. To test date commands before inserting them into the filename, use echo to print them:

$ echo `date +\%b`

Add this to a new line in crontab:

0 0 1 * * sudo tar -cvpzf /var/web_backup/web-server.`date +\%b`.tar.gz -C /var/ www -C /etc/ apache2

This schedules cron to run the monthly backup at 00:00, on the first day of the month. After a year, older archives are overwritten.

5. Back up Offsite

To complete the 3-2-1 backup rule, the offsite server will use rsync, to download the backups from the web server. Ensure both the web server and the offsite server have rsync installed. It is installed by default on Ubuntu systems, and can be manually installed with:

$ sudo apt install rsync
  1. Log into your offsite server and create a directory to store the backups:

     $ mkdir server_backup
  2. Run rsync to download the backups from the web server. Include a trailing slash after the remote directory to only transfer the contents of the directory.

    $ rsync -azP --delete username@remote_host:/var/web_backup/ /path/to/offsite/server_backup

    rsync takes a number of arguments:

    • a - Archive, sync recursively while preserving symbolic links, file permissions, etc
    • z - Compress file data during transfer
    • P - Short for --partial and --progress, keep partially transferred files if interrupted and display the progress of the transfer
    • --delete - Delete extraneous files from the destination directory

    Use caution with --delete. Test your command with the --dry-run option first to prevent data loss.

Excluding files

You may exclude files from rsync. To exclude a file from being transferred, use the --exclude=relative/path/to/file/ argument in your command. For example, if you want to exclude Wednesday's backup from being transferred:

$ rsync -azP --delete --exclude=web-server.Wed.tar.gz username@remote_host:/var/web_backup/ /path/to/offsite/server_backup

Repeat --exclude=relative/path/to/file/ for each file or directory you wish to exclude.

Note: Don't include the full path when specifying what to exclude, exclude paths need to be relative to the source path. For instance, to exclude Wednesday's backup stored in /var/web_backup/daily/, use daily/web-server.Wed.tar.gz, not /var/web_backup/daily/web-server.Wed.tar.gz.

6. Passwordless Rsync Login

To automate the process, configure rsync to connect to the web server from the offsite server without a password.

  1. On the offsite server, create a key pair for rsync:

     $ ssh-keygen
     Generating public/private rsa key pair.
     Enter file in which to save the key (/your_home/.ssh/id_rsa):
  2. Name the key id_rsa_rsync and do not enter a passphrase.

  3. Add the new public key to the web server.

     $ ssh-copy-id -i ~/.ssh/id_rsa_rsync username@remote_host
  4. The offsite server should be able download files from the web server without a password. On the offsite server, test the command.

     $ rsync -azP --delete -e 'ssh -i ~/.ssh/id_rsa_rsync' username@remote_host:/var/web_backup/ /path/to/offsite/server_backup

    If someone gains access to the offsite server, the key pair will grant them login access to the web server. It's possible to limit the key and prevent remote login.

  5. Find the exact command being executed on the web server with grep by including the verbose -v flag to ssh.

     $ rsync -azP --delete -e 'ssh -vi ~/.ssh/id_rsa_rsync' username@remote_host:/var/web_backup/ /path/to/offsite/server_backup 2>&1 | grep "Sending command"
    
     Output:
     debug1: Sending command: rsync --server --sender -vlogDtprz . /var/web_backup/
  6. SSH to the web server, and edit the authorized_keys file:

     $ sudo nano ~/.ssh/authorized_keys
  7. Find the public key you added earlier. For example:

     ssh-rsa AAAAB3Nza...[many characters]...LiPk== user@example.com
  8. Insert a command directive before ssh-rsa, using the command from step 1. The line will look like this, for example:

     command="rsync --server --sender -vlogDtprz . /var/web_backup/",no-pty,no-agent-forwarding,no-port-forwarding ssh-rsa AAAAB3Nza...[many characters]...LiPk== user@example.com

    This directive limits the key to one command, reducing the risk a malicious user could gain access to the web server.

  9. Test the rsync command using the key:

     $ rsync -azP --delete -e 'ssh -i ~/.ssh/id_rsa_rsync' username@remote_host:/var/web_backup/ /path/to/offsite/server_backup
  10. Attempt to log in with the key and verify access is denied for interactive login.

    $ ssh -i ~/.ssh/id_rsa_rsync username@remote_host

7. Automate the Process

Open crontab on the offsite server.

$ sudo crontab -e

Append the rsync command to the bottom of the file.

00 03 * * * rsync -azP --delete -e 'ssh -i ~/.ssh/id_rsa_rsync' username@remote_host:/var/web_backup/ /path/to/offsite/server_backup

This schedules cron to run the rsync command at 03:00 each day.

Conclusion

You have a working 3-2-1 backup strategy! Your website's automatic backups are made daily and sent to an offsite location with retention for one year.