An Rsync Command and Cron Job

Published: 11/16/2020

Subscribe Premium $10/month

Checkout with Stripe

A famous (infamous?) hackernews comment replied to the dropbox launch with the critique:

1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.

To be honest I thought the commenter said dropbox was just an rsync command and a cron job. Because this past week I unironically did that to set up a backup solution for my hard drive.

I've been meaning to back up my files for a while. I still have a bad habit of keeping pretty important files to me on one file on a laptop's disk even though my last laptop went kaput ending in boot hell when I turned it off at a bad time. (Though I'm still a little unclear about what happened. I sent it in to Asus and they said it was a problem with the motherboard which would need to be replaced. Can you damage the motherboard by turning off the laptop during the boot sequence?) And I had a friend who had their laptop stolen in a hotel and lost a lot of files which were important to them (This is one of the reasons I prefer Airbnb to hotels. You know if something like that happened at an Airbnb the Airbnb would side with the customer and make it right. When this happens at a hotel what are you going to do?). A lot of my projects (like this website) are backed up in various github projects. But that's not ideal because there are adjacent files to my website (like my drafts) which aren't backed up in anyway. And my research project with all my math writing I only back up when I remember to.

Luckily for me I had some hardware lying around from past projects. I have a 1 TB hard drive I got when I was thinking about becoming a streamer (I streamed agar.io once and mtgo once) and I have a raspberry pi from when my aforementioned laptop broke and I thought it might be practical to replace my laptop with it (not the craziest idea, but there were a few things it couldn't quite handle. And carrying a monitor around turned out to be impractical. I guess there's a reason laptops exist). The pi is actually a recurring character in my blog. You can read about it's previous job in my light alarm clock tutorial. That turned out to be a bad idea as well. I just get up when I'm awake now. If it's 3 it's 3. So now my raspberry pi is querying predictit for market data every 5 minutes and has been for the last 4 months. A fun fact I learned doing that is that FAT directories can only store so many files. So for instance my hard drive, which is vfat according to du can store about 65000 files per directory. So I have a data1 folder, a data2 folder ... But anyway I thought I'd give my little pi another job backing up my personal computer.

The back up command is note too bad:

rsync -azh --size-only --filter="merge /home/jaek/Development/sync/filter.txt" /home/jaek/ pi@192.168.1.83:"/media/pi/ja3k\\'s\\ hd/backup/" 

Note the complicated target because for some reason I decided to give my disk a stupid name with spaces and a single quote. Rsync basically syncs my home directory with the backup folder in my pi's hard drive. Note I wrote directories out because I ~ expansion doesn't work where this command is called.

Unfortunately my home directory is full of crud I thought would be wasteful to back up. There's the obvious crud like the screenshots in my photo's directory and the one off files in my Downloads and the window's VMs I have for playing mtgo. But there's also a sea of dot files. And unfortunately some of the dot files are created by me (like my vimrc which you can check out here. It's like 60% little abbreviations for math writing. As if that was the bottleneck in anyone's mathematical productivity) and others are dumps by other programs which I think are probably unimportant but I'm also too scared to delete. I'm not sure I want to know what .ICEauthority is doing. There wasn't an obvious hard and fast rule to distinguish the files I wanted to back up from those I didn't so I made a little filter file.

+ /Development/***
+ /Documents/***
+ /art/***
+ /deks/***
+ /Gaming/***
+ /metrics/***
+ /personal-site/
+ /personal-site/todo/***
+ /personal-site/drafts/***
+ /.vim/***
+ /.journal/***
+ /.xmonad/***
+ /.gitconfig
+ /.bash_history
+ /.python_history
+ /.zshrc
+ /.zsh_history
+ /.xsessionrc
- /** 

Some miscellaneous observations about the folders I backed up:

Backing up to the raspberry pi a few meters away from my laptop is really only covering a few potential data loss scenarios. I can worry less about my laptop suddenly crapping out or getting stolen at some cafe (whenever I go to a cafe again). But in the event of a home fire, flood or burglary I'm likely still fucked. So I might set up an Amazon S3 bucket and sync things there too. I've done that before for my website. It's simply hosted in a public S3 bucket. Amazon's sync doesn't seem to support a filter file and I'm less concerned about storage space on the cloud so my command would probably just be:

aws s3 sync ~ s3://backup

Of course if my back up system relies on me pressing any buttons it's unlikely to be consistently done. So I added the following line to my cron jobs:

10 11 * * * /home/jaek/Development/sync/sync.sh

The numbers and stars out front basically say to run the job at 11:10 everyday. To make a new cron job run crontab -e.

Of course I didn't really replicate all of dropbox since I can't do nice things like access my files from other computers or easily share them and in my current setup if two hard drives were to fail I'd lose my data, which isn't too reliable. But as always it's satisfying to do something for yourself instead of using other's products.

[1] Bonus tip: You should increase the length of the history in your bashrc by 2 orders of magnitude. Just add some zeros to the end of HISTSIZE and HISTFILESIZE in your bashrc. Also is it just me or is google chrome history starting to get forgetful? I want to know the 8 years of ill advised meandering which made me who I am today but often when I search for something like a recipe nothing comes up. Like if I search for 'bao' I can't find the bao recipe I used in Fall 2018. Big Tech get's a lot of hate for collecting too much data but to be honest I wish they collected even more. Maybe I should write a blog post about that.

[2] What is with software and putting an X out front? I also use Xournal for example.