Today, we'll cover why most backup solutions suck, and why yours is most likely in that growing list of suckage.
When I set up a backup system for one of my clients, or even myself, I ask the following questions (usually to the wall, I look insane when I'm in the office):
- Is it redundant?
- If it breaks, will my client still have backups? Will the client be notified?
- What kind of verification is there that these backups really exist?
- Is it going to silently fail because the permissions are off, the backups aren't really saved, or the archived backup is corrupted?
- Is it secure?
- Security is a big deal, even when your system backs up to an in-house infrastructure. How secure is this puppy?
These are all very important questions, and unfortunately a majority of the backup solutions we have seen don't address them. Let's dive in!
Backups from Your Hosting Provider
For the sake of this series, we will not include dedicated infrastructure as some of the diagrams will be misleading for dedicated infrastructure, and most dedicated server providers and data centres don't offer a built-in backup solution.
Most web or server hosting providers offer some sort of backup solution. There are a few problems that are introduced with using a backup solution from your hosting provider as your sole backup solution. Let's touch on each question we addressed earlier.
Is it redundant?
Truthfully? Probably not. Most hosting providers give you a backup solution for a few bucks per month, depending on the size of your website or server. For example, DigitalOcean's backup system is only 20% of your Droplet's cost. At that price you're getting a great deal to store your data, but you're not paying for redundancy, security, or verification.
Your data is most likely stored in the same data centre. Why? Because it's faster and more cost effective for providers to transfer data within the internal (or "private") network.
In most cases this means that if the data centre has a network outage, your data would be affected as well as your system. So let's say the data centre does have an outage, you wouldn't be able to restore that data to another data centre because there is no cross-DC redundancy.
Figure 1.1 demonstrates how most virtual server providers and shared web hosting providers backup their clients' systems.
Figure 1.1 - How Most Providers' Backup Systems Work
: The API endpoint is usually called, regardless of whether you're using the provider's API directly or not. Most websites that have APIs also use their own API internally, so when you click the "backup" button on your provider's page, you're really launching an API request (we'll call it do.backupServer()).
: The host server, in most cases, receives both of the requests. The reason? Your system is on the host, and it's way easier, and faster for a provider to send an archived file across the internal network than it is to send multiple files decompressed across.
The host server will do both receiving and sending of the archive, but the decompression most likely happens on the host server. If this is a snapshot, the snapshot may even be stored on the host server. That brings redundancy completely out of the equation. What if the host goes down, and along with it your snapshot? Now you can't restore from backup, because the storage server is out of the picture.
So, what if this system breaks? Will you be notified? In some cases, yes. However, sometimes the validation of a backup cannot be determined before sending it away to be stored at the storage server. Let's say your daily, weekly, monthly backup is successfully stored, but is missing content, the snapshot is corrupt, or has a broken file system when you try to restore it. If the provider's backup system doesn't check the validity of the backup, you're out of luck when you lose all your data in that unexpected crisis we spoke about in the first edition of this series.
What kind of verification is there that these backups exist?
The fact is, you have no idea. Most providers will provide a canned response when you ask this question. Sure, they don't want to expose their trade secrets, but this is your content, right? You should have the right to know where your backups are going, and exactly how they're being handled.
There may be verification, there may not be. This will largely differ between the hosting provider that you use, but in most cases there are hundreds, if not thousands, of clients using this same server. Imagine you're in a college classroom with one professor and fifty students. You will likely receive some attention. Now if you're in the same classroom with three hundred students, you will likely receive very little attention. The same is true when it comes to system maintenance.
There are thousands of people using the same system. There's no way that the entire team at your hosting provider is focusing on each individual person. The fact is, they address issues as they arise. When you tell them about the issue, or when you report your backup is corrupted, you will receive attention. By then, it's too late.
The solution? You need a better backup solution.
How secure is this puppy?
Ah, the real kicker. Security. Your database is likely filled with people's (hopefully hashed, salted, and peppered) passwords, credit card numbers, and other personal data. You wouldn't want anyone getting that data, so you've done everything you can to secure your system.
What about the system your files are living on? How secure is this provider's backup system? Again, you have no idea. They could be encrypting your data, sending it over TLS encrypted connections, and doing everything they can to make sure that your data isn't compromised by storing it on an encrypted file system.
Or, maybe they're not. Maybe they're sending the files over unencrypted, sending tar files with your file system in them. You'd never know...
Ah, so you're one step ahead. You have an in-house solution. That's great for the following reasons:
- You know where your data is going
- You know how it's getting there
- You know if it's encrypted or not
Chances are, though, your in-house solution still sucks. Why? Because it's in-house.
You've built what you believe is the best backup solution for your company. Perhaps it looks similar to Figure 1.1, or perhaps you have a more complicated structure. Let's test your backup solution for our questions, though. As I don't know your specific backup solution, it's likely that these won't fit everyone's backup solution. You could have the best backup solution, but let's give it a go anyways.
Is it redundant?
Probably not. Let's say your storage server goes down. Do you have that storage server replicated to another server? Even if you do, it's probably not replicating to another network, is it?
The thing is, replicating file systems, snapshots, and other large backups to another location via a public network is deemed "insecure" and it takes forever to send those files via the public network. That's why most system engineers refuse to do it. But, let's say your internal network goes down, and your file system becomes corrupt. It's unlikely that both will inconveniently happen at the same time, but if they did, you'd be pooched until you got the internal network back up.
Now, if you had another storage server on a different network, you'd have no problem transferring that backup from Network B to Network A, and starting up your systems again after restoring.
Security-wise, you can encrypt files. GnuPG is a great thing, and if you're encrypting files via GnuPG private keys (symmetric encryption even), you're not going to have to worry about transferring those files across the public network anymore.
Speed-wise, it's going to take a while, and it'll probably use up data transfer. But if you're using an in-house system, you're probably not going to have any problems with the data transfer, and if you're using a system in a data centre, it will be fast enough and you'll likely have enough data transfer to handle the transfer. Trust me, server providers give you an amazingly high amount of transfer.
What kind of verification is there that these backups really exist?
Hopefully you've planned for this when you were building your backup infrastructure. Does your system verify the validity of these backups? How?
Are you using a checksum system? Are you 200% sure that your backup isn't missing that
txt file from three weeks ago that you're going to desperately need someday?
Verifying the status of a backup, incremental or full, is fairly difficult and we'll touch base on that in a later edition. But let's say you've got that down, how does your backup system handle a corrupt backup?
If your verification system fails to validate the backup, does it take a new backup? Does it notify someone? Or does it just silently fail?
Is it secure?
This is one of the variables that you have full control over. Your backup system is as secure as you make it, unlike with the hosting provider's backup system.
Do you use file system or snapshot encryption to make sure that your backups are as secure as your running system? If you're just obscuring backup files, or making it look secure, you could be falling into a pit of misery.
If your backup isn't as secure as your main system, your main system's security is moot. If an attack vector can be made out of your backups, then your whole system can be compromised if an attacker can get their hands on your backups.
Have you found an issue with your existing backup system? If not, leave a comment in the comments section below and let us know what you think you're doing right. Or perhaps what you think you may be doing wrong.
We're going to provide more information about the solution, the bigger picture, in tomorrow's edition. We're going to explain what we believe to be the "ideal" solution, and throughout the week we will provide more information on how to implement the ideal solution, and how you can do so at a cost that's effective and suitable for your company or website.
The bottom-line here is that your current backup solution sucks, but we'll teach you how to fix that.