What is the Ideal Backup System?

We've been talking about it all week, the ideal backup system. What exactly does that mean? What makes the ideal backup system... well, ideal? There's a lot of things to consider when you're building a backup system, and the ideal backup system will cover all of the things on our checklist.

So where's this magical checklist? We've made one just for this series below.

[ ] Is it redundant?
[ ] Does it notify me on failure?
[ ] If it failed to take a backup, will it try again after notifying me?
[ ] Does this system verify that backups exist, and are valid?
[ ] Does the system notify me if backups don't exist or aren't valid?
[ ] Is this as secure as it can be? (eg. file encryption, secure transfers)
[ ] Can I scale the system, or will it break when I add more data?

These are all good questions to ask yourself about your backup system, but your needs will vary. As a result of varying needs, you may need to consider additional questions or concerns for your company's solution. This is just a starting point.

Let's talk, in detail, about each point on our checklist.


Redundancy

It's critical that your backup system is redundant. If you read yesterday's edition, you'll know that one of the main reasons your current solution sucks is because it's not redundant.

For the purpose of this series, let's assume that you have a WordPress e-commerce website running on two Debian 8.0 (Linux) VPSes. Your website is highly available, because you have clients from all over the world that require your website to be online to order your products.

You have every tick on the list ticked for your highly available solution, but your backups aren't the main point of your concern. Your concern, of course, is keeping your website online.

Now, let's say you have a server in the US, and a server in the UK. In the event one server crashes, you have immediate DNS failover set up so that your website will see little to no downtime. One night, your UK server goes offline. No problem! You failover to your US server, and you start investigating the issues with your UK server.

A few minutes into the investigation, you realise that it was caused by a DoS attack. Your DNS failed over, but now your US server is being attacked. As inconvenient as it is, one of your team members just accidentally nuked your database. Your US system is under attack, your team member hit the kill switch on your database, and your UK system no longer has the current database. You have backups!

Boom, you hit the switch. Except, your backups are stored in the US data centre. The same network that's being attacked is where you store your backups. Now you have a significant amount of latency when transferring your backups across to your UK data centre, and your UK node may be offline for hours if your site's big enough to warrant such a transfer.

You're still being DoSed, fires are starting left and right. Your US server becomes unreachable, but when your automatic DNS fails over to your UK server, boom. No database. All your client's information is gone. It becomes a customer service nightmare!

The solution? You should've been replicating your backups to your UK data centre when they were taken. If they're locally available, there's no need for transfer, and you can restore from backup immediately when such an event occurs.

Redundancy isn't just important when we have high availability solutions, though. You could have your backups and server hosted in the same data centre, but let's say your data centre goes offline because of network issues, power outages, or natural disaster hits.

Now your data is confined to one central place, and you have no way of accessing it. So many hours, days, months, years worth of work inaccessible. If the server with your backup, and the server with your live system both were to crash, all of that data would be gone forever.

Have you ever heard the saying, "I took a backup of your backups so you can restore your backed up backup?" You should keep redundant backups. You may think it's silly, you may think you'll never need it. That's why it's called 'redundant.' You may never need it, but it's better to have than to need when you don't have.


Notifications

Almost equally as important as redundancy, but for a whole new reason. Your system doesn't know what you want until you tell it what you want. That means it won't magically notify you when your system goes down, your backups aren't working, or a wrong command in the code caused your system to completely delete your files you wanted backed up (long story).

You need notifications. Even if you're among the many of us that absolutely hate being told a million times per day that the system is running perfectly fine, that one e-mail that says otherwise will save your job, your company, and your sanity.

So, what should you want notifications for? These are some good things to start off with.

  • When a backup starts
  • When a backup completes
  • What, exactly, is backed up in each backup
  • Any errors that may have been caught by the script
  • When a backup's validity/verification fails
  • When a backup's validity/verification succeeds
  • Any errors on transfer
  • All successes on transfer
  • When backups "disappear" or aren't there at the end of the script's running

I know what you're thinking, "It worked, Dave. I don't care why, it worked. Why would I want to log that it succeeded?" It's simple. You want to know when it succeeded because when it does fail, you'll have something to compare it to, and to revert back to.

Besides, it's unhealthy to focus on our failures. The important thing to remember here is triage. You can set up notification levels. For example:

Level 1: Log to file => [backupStart, backupStop, filesBackedUp, verifySuccess, transferSuccess, backupSuccess]

Level 2: Log to file AND notify e-mail => [transferFail]

Level 3: Log to file AND notify e-mail AND notify emergencyContact => [backupFail, verifyFail, backupDisappeared]

Where "emergencyContact" could be via SMS, HipChat, Slack, or whatever you fancy using. By using the defcon or triage technique, you're no longer receiving annoying e-mails about successes, but they're still logged so that you can refer to them later.


Verification

There's a reason we verify that things are actually sane when it comes to systems. The system won't verify it for you, in most cases. It's impossible for you to know that your backup is valid without actually restoring it, but if you're using a simple backup system based on tar, you can pass the -t argument to perform a dry/test run on the tar archive.

Assuming all checks out, you can know that the backup is verified. Now, to confirm that the contents inside the tar archive are exactly the same as the contents you're backing up, you'd have to run some checksum mojo.

Theoretically, your script could extract the tar file, run a quick find command, and get the checksum for both resulting directories. That seems a bit convoluted, and in most cases tar can be trusted to give you the directory packaged up.

Alternatively, you could just use tar's built-in --diff argument or --compare argument. Assuming a file is modified between the time it was tar'd and the time the --diff or --compare arguments were passed, you should see something like Mod time differs or File size differs.

An important note about the --diff and --compare arguments is that they won't pick up new directories or files within the original directory. They will only check the status of tar'd files against the status of the same files in the original directory, or the directory passed through the -C argument.

For more information about tar, please see the tar manpage.

The important thing to remember here is that no backup is useful until it has been verified. If you take blank backups every day, but never catch it because you haven't verified, when it comes time to restore you'll be in a pickle.


Security

We touched the topic of security quite a bit in yesterday's edition of this series. It is imperative that you take security of your backups just as seriously as you take the security of your live database and files.

Your client's data is no longer in one central location. You're now leaving your client's data in the hands of anyone that has your backups, your server hosting company, and any employees that have access to this information. That means credit card numbers, passwords, and other personal information are at the disposal of a very large group of people.

How much do you trust your hosting company? Hopefully enough to stay with them, but not enough to keep your files unencrypted, passwords unsalted/unhashed/unpeppered, and backups open to the public.

The bottom-line: your hosting company isn't responsible for your security. They're responsible for the security of their systems. They may not openly tell you that, but if you're not paying somewhere in the range of hundreds of dollars per month, it's not their problem. Now, if you're paying them to make it their problem, great. Don't trust them anyways.

Story time. Your company just renovated their systems, moved to a virtual private server, and secured all of the file systems, databases, and even implemented routine security analysis. Great job!

Your backups are stored conveniently in three places, in-house and two data centres. One of the machines that your backups are on happens to accept password authentication via SSH. You don't think anything of it, so you let it be for now. You'll tell one of the security guys to fix it eventually.

A few days go by, and you've been notified that the system you thought could wait has been bruteforced. Your backups have been transferred to another remote system, and then it hits you. They weren't encrypted.

Your database dump is now in the hands of the attacker. Once you dump your database, you don't need a root password. Your clients' information is readily available in a cute SQL format for the attacker.

Now, had your data been properly encrypted, your SSH not accepted passwords, and your system properly scanned for security, this wouldn't be a problem. So they have a file, good luck getting the GnuPG key to open it. You'd reset your clients' passwords (because that's the sane thing to do), notify everyone of what happened, and have a story to tell about how encrypted your data is. Instead, all of your clients have been compromised because you never encrypted the data.

Security is vital in all systems. Even backups.


Scalability

Your website, software, whatever you're hosting is going to grow. You want it to grow. That's part of owning a business, developing a website, and maintaining a service. Without growth, you're staying exactly where you are.

So your backup system needs to grow with you. If you develop a backup system that can only handle a 10GB database, when your compressed database becomes 20GB, you're going to have to figure out a way to handle the change quickly. You'll be without database backups until you fix the bug in your code.

You should always consider scalability in your development process. Your company is going to grow, so make sure your systems grow with you and can handle that growth.


Sneak Peek

You're probably dying to hear about how to actually implement this backup system. The truth is, it's going to depend entirely on your company's requirements. However, that doesn't mean that we can't send you in the right direction.

That's exactly what we plan to do tomorrow. We're halfway through this week, and I hope everyone's Wednesday has started off great. Tomorrow, we'll discuss how much this implementation costs, and what your company can do to budget the costs.

Backups don't need to be expensive, but you'll find that it's way more expensive when you don't have them.

Follow our Twitter, Google Plus, or Facebook pages for notices when we release our next edition of this series tomorrow. Alternatively, you can subscribe to our RSS feed.