J

Personal data backup plan

10 Dec 2020

Like most of you reading this, I have important personal data scattered around on various devices (external hard drives, iOS devices, The Cloud™) and until recently, this had not bothered me too much. I take good care of my physical hardware and the cloud services I use have redundancy and durability built in. That was, until I got a text message from a friend who runs his own business earlier this week that yet another one of his server drives had failed and he was needing a hand to replace them. I’m not going to go into details on why these keep failing but I will say that this is the 5th drive in 3 months that has experienced a failure. While I can’t really explain why, this triggered me to start thinking about my own personal data backup plans.

The data I’m talking about here is stuff like family and vacation photos, contracts and legal documents. While the last two are also physically available elsewhere, I didn’t have any redundancies in place for my photos. I got to thinking about this and after a short while realised I needed to do something as I would be gutted if I was to lose those since there is no guarantee I would get them back if the devices were to fail.

So down the rabbit hole I went. As is pretty standard practice for important data, I chose to implement a 3-2-1 backup strategy. From the linked article, the strategy is summarised as

Three total copies of your data, two of which are local but on different mediums (read: devices), and at least one copy off-site.

For me, I decided I would opt for:

  • a local Network Attached Storage (NAS) device
  • a cloud provider
  • a physical offsite replication

This is in addition to the data on the initial devices themselves.

Network Attached Storage (NAS)

A NAS can be simplified to a collection of hard drives that are configured to either split or mirror data between them however present it in a unified fashion through a file system GUI. The intention is that you build the hard drive configuration in whatever way that suits your data and data redundancy needs.

Despite working in tech, I’m not a big “home tech” sort of bloke. This picture sums up my position pretty nicely.

As this would be the first NAS I’ve owned from new, I decided to scope out the market for my budget and needs and ended up settling on the Synology DS420+. There are plenty of great NAS brands out there however Synology provided some features that I couldn’t find in others and have a good reputation to back it all up so it was a win-win for me. To work out how many drives I needed, I had to first work out what configuration I would be storing the data in. After checking out the Synology RAID calculator (awesome resource even if you don’t use a Synology device!), I landed on RAID6 which would allow meet my durability and availability goals with enough redundancy built in that multiple drives could fail and I would still be able to rebuild the array without needing data recovery.

For the drive sizes, I collected all the important data I intended on storing in there and multiplied it by 5-6 times to factor in growth over the next 5 or so years and then purchased drives that made up that space. It’s important to decide on your RAID configuration before doing this as that determines how much usable space you will have available to you. If you decide to take this route yourself, be sure to get NAS specific drives as they are optimised for that particular workload and you’ll get better performance any longevity out of them.

Once the NAS and drives arrived, I configured it and started copying in the data feeling instantly better I had at least another copy.

Cloud provider

While I don’t have a “favourite” cloud provider (read: all of them have pros and cons with great uses for each) I tend to sway towards Amazon Web Services (AWS) as I got on board when Google Cloud Provider (GCP) didn’t have a huge range of features and I was already using it pretty heavily in my day-to-day work. GCP also doesn’t have anything that would cause me to up and jump ship at the moment. However, I didn’t chose either of these options for my cloud backup. Why? The balance of cost and data retrievability. In both AWS and GCP, you can store data cheaply and with excellent redundancy out of the box however they charge a fee for retrieving “cold storage” items which can quickly add up and have a slow turn around time on them.

Looking around the web, I stumbled on Backblaze. They have a few products but the one I ended up using was B2 Cloud Storage as they were the same price as the big cloud providers but didn’t charge any extra for quick recovery. This was a win in my books as it also allows me to validate the backup is working at any point in time without a delay.

With the Backblaze account created, I hooked up my NAS to automatically sync to Backblaze using Synology Cloud Sync. This is an automated process that watches certain directories for changes and then syncs them accordingly based on rules you create. I specifically made it a one-way sync so that even if I accidentally delete something on my NAS, it won’t be replicated to the Backblaze account.

During my evaluation phase, I also found that Backblaze offers shipping you the data on a USB device if you ever need the data back and it’s too large or slow to download. The great thing is, providing you return the device within 30 days, they refund the cost!

Physical offsite replication

This portion of the plan was something I ended up changing later in the process. Initially I was just going to buy another drive and have it constantly sync however a good friend of mine ended up also purchasing a Synology NAS around the same time which opened up the possible routes I had here. We ended up configuring Hyper Backup which allows syncing of one NAS to a multitude of other services (including another NAS). This meant I’d have another physical copy of the data within a short driving distance and it only cost me another drive. A bonus here is that hyper drive allows encryption of the synced folder which means that while he physically has access to the data, he (or anyone else for that matter) is unable to read the data I’m backing up there.

Other bits

Since I spent the time and money on a NAS, I wanted to abuse take full advantage of the capability. In addition to storing my important data, I also set up:

  • Backups for all the laptops and iOS devices in the house. Everyday at 1am, all of our devices will sync to the NAS making a restore reasonably quick when they are needed. It also means we don’t have to think about doing backups as they are automatically handled on a schedule.
  • I’ve added a list of “known backup devices” which is a collection of other storage mediums we have which when plugged into the NAS, automatically sync. Similar to the laptops and iOS devices, this means that backups automatically happen out of band without someone needing to explicitly do it.

In an upcoming article, I’ll go over the improvements to my home lab redundancy which the NAS helped shape and improve since it was a key piece of the infrastructure.