Image Image Image Image Image Image Image Image Image Image
Scroll to top

Top

One Comment

IT 101: RTO, RPO and the Economics of Beer

IT 101: RTO, RPO and the Economics of Beer

It’s no secret that backup and recovery are critical important tasks in the broader IT portfolio. Sometimes, though, there is lack of deep understanding for the critical metrics that drive backup and recovery. This article explains the economics of beer using Recovery Time Objective (RTO) and Recovery Point Objective (RPO) metrics.

The Scenario

Consider, if you will, all of the work that is performed throughout your company. People process sales orders all day long. People create new documents. In short, there is data constantly being added to your company’s systems throughout the work day and often beyond. In the world of traditional IT, after the end of the workday – generally in the wee hours of the morning – all of this work was then backed up in order to protect it from hardware failure, natural disasters, or even just human error.

Recovery Point Objective

Now, think of all of that data that changes throughout the day as beer. Let’s pretend that, every day, your organization creates 240 ounces (liters for our non-backward US friends) of “beer data” that needs to be protected. During the backup process, you essentially pour this beer into a long-term storage repository in order to protect it against failure. Essentially, immediately after a beer backup, the pitcher is empty and slowly fills up throughout the day as data is changed and created by users in the organization. By the end of the day and right before the backup begins, the 240-ounce pitcher is full.

Figure 1: By the end of the day, the beer data pitcher is full

Figure 1: By the end of the day, the beer data pitcher is full

So, what happens if your data systems suffer a catastrophic failure just before the backup begins? You lose all 240 ounces of that beer. It short, it all gets poured down the drain. The same is true for your data. If you protect your data using just nightly backups and there is a failure right before the backup window, you will lose 24 hours worth of data changes.

If you’re using this kind of backup system – where you back up once a night – you’re implicitly adhering to a 24-hour Recovery Point Objective. You’re basically saying that losing up to 24 hours worth of data – or 240 ounces of beer – is acceptable to the business. RPO is the metric that defines how much data your organization is willing to lose in the event of a failure that has the potential to result in data loss. Goodbye to 24 hours of beer!

Improved RPOs Are Needed

Obviously, few organizations want to risk an entire day’s worth of data! The cost of that kind of loss is incredible. As such, it’s necessary to find ways to reduce this data loss window. In an ideal world, organizations could achieve “zero RPO” where there is no data loss. However, the closer an organization tried to get to that zero RPO goal, the more expensive and potentially complex the solution becomes.

This fact leads organizations to starting making decisions about the cost/benefit of better RPO. One common solution is to move away from nightly backups to smaller backups that take place throughout the day. This kind of capability is sometimes referred to as Continuous Data Protection and the process involves backing up any data that has changed since the last interval.

Here’s what this might look like. Instead of performing a backup every single night, let’s say that the sample organization instead indicates that they want an RPO of 1 hour. In a 24 hour day, this would then mean that only 10 ounces of beer – or one hour’s worth of data – would be lost in there was a failure of some kind. The diagram below shows you that even a failure that takes place in the middle of the night would result in losing only 10 ounces of beer rather than the whole 240 ounces that would be lost with a 24 hour RPO.

Figure 2: Smaller RPOs result in smaller amounts of lost data

Figure 2: Smaller RPOs result in smaller amounts of lost data

And now time for a story. In a previous life, I was the CIO for a small college and we had a 30 minute RPO for the database system that supported the college. One day, at 3 minutes past two, something happened to the database that corrupted it, rendering it unusable. We were able to restore the backup from 2:00PM, meaning that we lost only 3 minutes worth of work for the college. At worst, we would have lost around 30 minutes worth of work had the failure occurred at 2:29 PM. That’s the power of a shorter RPO.

Recovery Time Objective

RPO is critically important as it defines just how much beer you’re willing to just pour down the drain. Once you’ve poured that beer out, though – that is, once you’ve suffered a situation that results in data loss – the critical metric shifts. Now, you’re more interested in how long it takes you to refill the beer mug from your long-term beer archive. In technical terms, you need to determine how long the organization is willing to be without data while you work to recover it from backup systems. This metric is often used to support such statements as “For every minute we’re down, the company loses $X.”

The Recovery Time Objective (RTO) is the formal name for this metric and is one that companies will go to great lengths to minimize. As is the case with RPO, the closer to zero that you attempt to get to RTO – that is, the less time that you’re willing to be down – the more it costs to support. To achieve very low RTO values, companies will often implement multi-pronged solutions, such as disaster recovery sites, fault tolerant virtual machines, clustered systems, and more.

In terms of simple backup and recovery, RTO defines how long it would take you to recover from the backup repository data that might have been lost due to a system failure. With old tape-based systems, recovery could be very, very slow, much like trying to fill our 240-ounce beer pitcher from a tap with a needle-thin spout. Disk-based backup and recovery systems and read data much more quickly. In this scenario, our beer tap might be filling that 240-ouncer with a regular beer tap spout, so the pitcher fills up a whole lot faster.

From a business perspective, defining RTO is all about making a clear business decision for how long you’re willing to wait for the pitcher to get refilled.

Summary

RPO and RTO are critically important data protection metrics. In the right frame of mind, it’s easy to see how beer relates to RTO and RPO and just how important it is to make sure that your beer containers remain full and well-protected.

 

Comments

Submit a Comment