Making mnemosyne redundant

In the last month I finally started doing backups of my life archive. Briefly, my life archive is a Dropbox folder called mnemosyne that contains sub-folders for different parts of my life that I want to record. I use Dropbox "smart sync" for this folder, because it has too much content to fit on my computer. (One of my goals for the life archive is never to worry about file size: I'm keeping RAW copies of DSLR pictures I've taken, for example.)

It's great to have all my historic digital content in one place, organized for later retrieval (yay!) but it seems like kind of a bad idea not to have a second backup of the content. So, I set out to build a backup: an external hard drive copy of the Dropbox folder.

Because I'm using smart sync, building such a backup is a bit tricky.

Getting files to the hard drive

My original plan was to create a virtual machine that would connect separately to Dropbox and do a full sync to the hard drive. I was going to create an Ubuntu virtual machine, boot it up, connect it to my Dropbox, and then tell it to do a full sync (rather than a smart sync) with the Dropbox folder being on the hard drive.

However, I wanted to use Arq to then back up this external hard drive to my Arq backup tool (currently Amazon Drive). This would give me three copies of the life archive in three different formats. This means that my host system (OS X) would need to be able to read the external drive.

At the same time, I want the external drive to be encrypted. It has some sensitive data on it, such as medical records and tax returns. The closest I got to finding an encrypted file system that both Ubuntu and OS X could read was VeraCrypt, but VeraCrypt made me uneasy due to basically everything about it: how TrueCrypt was abandoned, finding SourceForge links, and how VeraCrypt's web site looks, for examples.

To that end, I decided to connect the hard drive to my OS X machine, encrypt it using Apple's hard drive encryption, and then rsync to back up files from my computer to the external hard drive. This made connecting to the hard drive, using the hard drive, and understanding what's happening much simpler, at the expense of me having to build my own rsync process.

Using rsync with smart sync

First, for OS X you need to ensure that Dropbox's "Smart sync for OS X" is off. When this setting is on, OS X will report online-only files as being 0 bytes. This throws off rsync–I still don't totally understand how rsync compares files, but it definitely considers modified times and file sizes.

Second, as an aside I learned how painful it is to use smart sync with many small files. I had some directories with tens of thousands of files in them (think maildir-style email) and pulling these down took HOURS. To that end, after I finally downloaded them I created tarballs of them and got rid of the originals.

Third, I ran rsync with all the directories already on my file system (i.e. the local files).

cd ~/Dropbox/mnemosyne
for FOLDER in contacts guitar ingest pictures ...
  rsync -avz --delete $FOLDER/ /Volumes/mnemosyne-backup/mnemosyne/$FOLDER/

Fourth, I told Dropbox to move the large local directories to online only, so that I would have space for the remaining folders. I then told Dropbox to pull down the remaining folders.

Fifth, I did a full rsync of all files. rsync believed all the already-backed-up files were in sync because their file sizes were the same–even though they were online only. I kept doing this until rsync found no changes.

rsync -avz --delete ~/Dropbox/mnemosyne/ /Volumes/mnemosyne-backup/mnemosyne/

Backing up the external drive to Arq

This part was the easiest: I told Arq to create a new backup and set the encryption password. Maybe the only hard part was to ensure the Skip if volume is not mounted checkbox was checked.

However, it took Arq a very long time to back up the hard drive. I think during this time Arq wasn't backing up my local computer. Also, I think Arq is pretty good about retrying if there's a network interruption or if the computer is closed, but I didn't risk it: I kept the computer open until the backup completed. I think this took a day or so.