rsync
Ted Ruegsegger
Describes a simple method to build and operate a server that maintains copies of specified client file trees, using the efficient rsync
tool to capture changes, secure shell (ssh
) so that network connections need not be trusted, and keychain
to allow ssh
keys to be used in unattended operation.
rsync
?rsync
How-To?rsync
?rsync
is a tool to replicate files between two locations, typically on separate hosts connected by a network. It uses a clever algorithm to detect differences in files so that only the differences need be transferred, making regular backups efficient and fast.
rsync
How-To?The manpage and the wealth of documentation that comes up in a Google™ search can daunt the reader who simply wants backups, because most of it discusses other uses of rsync
(for example, running a file server—essentially a more efficient ftp archive—and mirroring websites). Also, much of the guidance on using ssh
in scripts proposes using a key with a null passphrase, a Bad Practice.
I waded through a lot of verbiage before I understood how to do what I want. In fact, it's simple and straightforward. I wrote this to save others the time and bother.
rsync
over ssh
to replicate the file trees.keychain
to manage the ssh
keys.rsync
, ssh
, and keychain
for it. If you're using GNU/Linux, I recommend Debian (stable) for any production server. This has the advantage of being almost trivially easy to update without breaking your applications, since Debian stable installs only fixes and security patches and never upgrades to a functionally different version of a package.ext3
filesystem on the big honkin' disk./bkp
and mount the big honkin' disk. Don't forget to add an entry to /etc/fstab
like:/dev/hde1 /bkp ext3 defaults 0 2
/bkp
for each client machine; in my case I have /bkp/grins/
, /bkp/mononoke/
, /bkp/pikachu/
, and so forth.rsync
and ssh
on the server.Install rsync
, ssh
, and keychain
on each client. Then, for each user that will be running rsync:
ssh-keygen -t rsa
ssh-keygen -t dsa
sh
, bash
, ksh
, zsh
...):
alias gokeychain="keychain --nogui $HOME/.ssh/id_rsa $HOME/.ssh/id_dsa ; \
. $HOME/.keychain/$(hostname)-sh"
For csh
-compatible shells:
alias gokeychain keychain --nogui ~/.ssh/id_rsa ~/.ssh/id_dsa ; \
source ~/.keychain/${HOSTNAME}-csh
keychain
will start the agent and prompt for the passphrase(s). After that, the keys will be in memory until you explicitly remove them or reboot the machine.authorized_keys
file on the server, for example, if I intend to copy files using user account ted
on server nox
:
ssh-copy-id ted@nox
To reload the keys, typically after rebooting the machine, for each user that will be running rsync:
gokeychain
above).rsync
manpage, but ignore all the stuff about running rsync
in daemon mode; that's for a public service, essentially a more efficient ftp server, and doesn't encrypt the traffic. In particular, examine the command-line options to rsync
and identify the ones you need for your situation.ted
):
cd /home
rsync -av --delete --delete-excluded \
--exclude "tmp" \
--exclude "[cC]ache" \
ted ted@nox:/bkp/mononoke
where:
-a
-rlptgoD
. It's a quick way to say you want recursion and want to preserve the file attributes as they are on the client.-v
--progress
to give you more info.--delete
--delete-excluded
--exclude
ted
/home
that I wish to replicate. See the note on strategy, below.ted@nox:/bkp/mononoke
nox
) as user ted
and put all this stuff under /bkp/mononoke/
root
user on the client, the server, or both, eliminating all permission issues, but raising other issues when you automate the process (I'm uncomfortable having root
's ssh keys in memory).It's tempting to exclude just obvious files (like "cache") and then explicitly include the directories I want to back up. For a single, manual backup, this is ok, but for automated backups, this is a poor strategy; if a user adds a new top-level directory on one of the clients, it won't get backed up unless I explicitly add it to the script. This violates my "no user intervention" objective.
A better approach is to specify all directories with a *
(or by naming the parent directory) and then add an --exclude
clause for each tree that I don't want. This way, any new directory gets backed up automatically.
Of course, there are exceptions. For example, suppose we're certain that all valuable stuff gets placed only in certain subdirectories and never in the parent and that, furthermore, the parent accumulates lots of files and directories whose names might not be predictable. In such a case, it makes sense to start in the parent directory and specify the directories we want, knowing that any newly-added stuff we care about will always be in one of them. That's easier than running a separate rsync for each subdirectory, or trying to keep up with excluding files in the parent that come and go.
To save yourself drudgery and error, put it all into a script. Let the script build the command based on the user and the client hostname. Put the script somewhere where each user can execute it, like /usr/local/bin/syncfiles.sh
on each client. It should look something like this (by default, any user on any host will back up that user's home directory, but you can add case
clauses for particular users and hosts):
#!/bin/sh ###################################################################### # syncfiles.sh # Replicate file trees to server using rsync # Usage: # sh syncfiles.sh # or call from cron (make sure ssh key is loaded beforehand) # Requirements: # Local user name must match remote (on server) user name ###################################################################### Host=$(hostname) User=$(whoami) keychain $HOME/.ssh/id_rsa ~/.ssh/id_dsa . $HOME/.keychain/$(hostname)-sh Excludes= cd home #rsync -e ssh -av --delete --delete-excluded \ rsync -e ssh -a --delete --delete-excluded \ --exclude "tmp" \ --exclude "[cC]ache" \ $Excludes $User $User@nox:/bkp/$Host |
Once you've decided what to back up, decide when and how often.
For a laptop used mainly by a single user, connecting to the LAN intermittently, it may be sufficient to execute /usr/local/bin/syncfiles.sh
manually from time to time.
For hosts that reside on the LAN, or that have multiple users, it makes sense to schedule the rsync
operations with cron
, with a separate crontab
entry for each user. For example, ted
's crontab entry might look like this:
# Back up files to Nox nightly at 03:36 AM: 36 3 * * * sh /usr/local/bin/syncfiles.sh |
Since catastrophes rarely happen, a painless automated backup system that quietly and reliably does its job can lull us into forgetting the whole point of doing backups in the first place: restoring our data. Make a point of running some test restores when you first start making backups, so you can note any surprises. Ideally, you should run a test restore on a regular basis.
Note: Don't use scp
to restore; you'll have problems with links. As far as I can tell, scp
doesn't understand links and simply treats them as regular files or directories. This will make duplicates and, in the worst case, can make endless loops (for example, if a symbolic link points to a parent directory).
To restore using rsync
, just reverse the procedure and omit (unless we don't want to restore everything we backed up) the --exclude and -- delete options. For example, if we backed up the contents of the /home/ted
directory with these commands:
cd /home |
then to restore them we use these commands:
cd /home |
Now ted@nox:/bkp/mononoke/ted
is the source and . (the current directory) is the target. Note that on our client either the directory /home/ted
must already exist or we must have permissions to create it.
We can also restore individual files, which probably happens more often than a catastrophic loss of entire directories or disks:
cd /home/ted/recipes |
Yes, it's possible to run rsync
from Windows clients, should you have users thus afflicted.
keychain
.rsync
.keychain
rather than use a key with a null passphrase.Since I'm lazy, I was delighted to find that ITeF!x (see Resources, below) combined rsync
and elements of Cygwin to build cwRsync
, distributed as a single "Installer" file for Windows. That's the approach I describe here. Its only drawback is the use of a null passphrase (the author says he plans to add support for keychain
), but the easy setup makes it the best method I've found so far for Windows. I've tested it on Windows 98 and Windows XP.
cwRsync
from the ITeF!x site.rsync
server unless you want it for some other reason. Assume cwrsync
is installed in C:\Program Files\cwrsync\
in the following examples. In DOS batch scripts we'll write this as C:\progra~1\cwrsync\
since scripts have trouble with embedded spaces.hosts
file (windows\hosts
in Win9x, windows\system32\drivers\etc\hosts
in WinXP), or just use the backup server's IP address in your rsync
scripts.ssh
will use this location to maintain keys and the known_hosts
list. Assume our user is "ebenezer" with home directory C:\home\ebenezer
in the following examples.ssh
keys with, alas, null passphrases:
c:
cd \home\ebenezer
mkdir .ssh
c:\progra~1\cwrsync\ssh-keygen -t rsa -N "" -f .ssh\id_rsa
c:\progra~1\cwrsync\ssh-keygen -t dsa -N "" -f .ssh\id_dsa
id_rsa.pub
and id_dsa.pub
) to the backup server and append them to the user's $HOME/.ssh/authorized_keys
file. A simple way is to use rsync
interactively; since the keys aren't yet installed, it will prompt for a password. Assuming you've copied them to the /tmp
directory, on the backup server:
cat /tmp/id_rsa.pub /tmp/id_dsa.pub >> /home/ebenezer/.ssh/authorized_keys
C:\Program Files\cwrsync\cwrsync.cmd
to the user's home directory, changing the extension to .bat
for Win9x (WinXP will run either form). Use Windows Explorer or just type:
copy C:\progra~1\cwrsync\cwrsync.cmd C:\home\ebenezer\cwrsync.bat
rsync
command line would be too long. As a workaround, use environment variables; that also makes it all more readable. You should end up with something like this:
@ECHO OFF REM ********************************************************** REM REM CWRSYNC.CMD - Batch file to start your rsync command (s). REM REM By Tevfik K. (http://itefix.no/itefix-en) REM REM ********************************************************** SET CWRSYNCHOME="D:\progra~1\CWRSYNC" SET CYGWIN=nontsec SET HOME=C:\home\ebenezer SET CWOLDPATH=%PATH% SET PATH=%CWRSYNCHOME%;%PATH% REM ** CUSTOMIZE ** Enter your rsync command(s) here SET RSYNCCMD=rsync -e %CWRSYNCHOME%\ssh -av --delete --delete-excluded SET EXCLUDES=--exclude "[Tt]emp" --exclude "RECYCLE[DR}" SET EXCLUDES=%EXCLUDES% --exclude '*[Cc]ache' --exclude '[Cc]ache*' SET EXCLUDES=%EXCLUDES% --exclude 'Temporary Internet Files' SET REMOTE=ebenezer@nox.home:/bkp/winbox c: cd \ SET DIRS=dev docs home mp3 "My Documents" ssh echo Backing up from C: drive: %DIRS% %RSYNCCMD% %EXCLUDES% %DIRS% %REMOTE%/c d: cd \ echo _________________________________________________________ echo Backing up from D: drive: [all] %RSYNCCMD% %EXCLUDES% * %REMOTE%/d set HOME= set CWRSYNCHOME= set CYGWIN= set PATH=%CWOLDPATH% |
where the rsync
options and arguments are the same as before. Environment variables are invoked with %varname%
. Note that we're handling each lettered disk drive separately. Note also that, for the C: drive, we're explicitly specifying subdirectories to back up, since we put all the stuff we care about inside them and the root directory tends to accumulate garbage we don't need. This is the exception mentioned in the note on strategy, above. But beware: if you add a subdirectory to C: that needs backing up, you'll have to edit the script.
.pif
file) rather than the batch script so that you get the environment space setting.I understand MacOSX is a FreeBSD derivative, so presumably you can follow the same instructions as for GNU/Unix, above, probably with some adjustments. Previous versions of MacOS may or may not support rsync
. This exhausts my knowledge of the Macintosh world. If someone kindly points out any documentation that would help Mac users use rsync
, I'll be happy to link to it.
Can't leave well-enough alone? Possibilities abound:
rsync
to mirror the first.rsync
to replicate the big honkin' disk on a counterpart that's geographically removed.rsync
Updated 19 Mar 2008 tbr.