Three letter acronyms usually oversimplify extremely complex and nuanced systems or ideas, and RTO and RPO are no different. Here’s what they mean and why they’re extremely important to your business.
RPO & RTO – What are they?
RPO and RTO encompass two similar ideas: ensuring business continuity and recovery of service from system outages. While the names and descriptions are quite simple, businesses invest a great deal of time and energy into achieving their desired recovery time targets.
Recovery Point Objective (RPO) is the point in time before the outage. It is the point that a company needs to recover to; we determine RPO by looking at the time between data backups, and more specifically, how much data is lost between those backups.
The direct translation: How much data can you afford to lose due to an outage?
Recovery Time Objective (RTO) is the amount of time a business can accept a system being offline/unavailable. Specifically, how long the system may be inaccessible from initial outage until full restoration of service.
The direct translation: How long can you afford for system XYZ to be offline?
When we start a conversation about RPO/RTO in the context of a database, clients usually reply, “We cannot afford any outage/downtime and we cannot afford any data loss!”
Aligning Recovery Goals with Reality
“No downtime and no data lost” is an ideal scenario and something that any company would love to strive towards, but a key word in the question and answer is “afford”.
The holy grail for system availability is “five nines”, or 99.999% uptime. That translates to less than five and a half minutes of unplanned downtime per year. This sounds like a fantastic goal to work towards. However, striving for this type of uptime is incredibly expensive and, in most cases, the cost of building and operating such systems far outweighs the business benefit.
The associated cost of high uptime systems rarely make sense unless your business is part of critical national infrastructure (e.g. telephone networks) or has incredibly high revenue per minute (e.g. Amazon). Service recovery is also generally not down to any single system (or it should not be if you have high uptime requirements) and single-point-of-failure architectures are inacceptable in such cases.
To avoid these weaknesses, separate portions of an entire system into self-contained “silos”. These silos increase the overall system resilience against outages. One of the oldest ideas in this area of IT is using RAID storage systems. This means mirroring disk storage so that a single disk failure does not destroy the data stored on it.
Although “five nines” uptime is not necessary for many businesses, there are many more businesses that can accept a short service outage but cannot accept any data loss. There are still more businesses that can accept both a service outage and a degree of data loss. This is the central topic we discuss with companies when looking at designing highly available systems.
To recommend the right strategy for the data platform environment, we need to understand which system needs what data availability. It’s not common to find a mismatch between technical implementation and a business’s needs. This can happen when there is a misunderstanding between the business and IT or because business strategy has changed over an extended period. A correct and full understanding of a business’s needs from a certain system can save huge amounts of time and money during implementation and as part of the operation costs of a system going forward.
No RPO/RTO Goal is Equal
Therefore, there are no right or wrong answers to what RPO & RTO should be. It all depends on your business needs, and those needs can (and do) change over time. Subsequently, you must review these goals periodically to ensure they are still aligned with your business needs.
We have a fantastic example of how a huge portion of businesses had have to recently change these goals. The RPO/RTO of remote working infrastructures have drastically changed in reaction to the wide-reaching restrictions of the COVID-19 outbreak.
Many companies had rudimentary VPN/teleworking infrastructure already available but had to rapidly review and update them to enable their workforce to work remotely. The RPO/RTO goals on these systems will have a drastically different “value” to the business than they did 6-9 months prior.
Aligning Tech to Business Needs
Only start the search for the technical solutions after you document business needs. It is of utmost importance that you follow the “business, then technology” investigation order.
We have seen many failed projects that started the other way around. Someone thought, “We want to implement ‘cool tech ABC’, what project can we think of to use it on?”. Unless your business is inventing “cool tech ABC” then this approach usually results in failure.
The following statement allows you to focus the search for a technical solution:
“Our call-center (available from 06:00 to 22:00, Monday to Saturday) needs an MS SQL database backend for the new CRM/ERP application. The system has 50 end users and 5million customers, placing 2500 orders per day. RTO for business hours is 30 minutes, RPO is 5 minutes.”
The statement isn’t perfect, nor does it need to be. But it does provides us with a pre-requisite (MS SQL Server) and allows the IT team to investigate technical options to fulfill the business requirements.
Need help identifying your business RPO/RTO goals and aligning them with technical solutions? Get in touch with us for a call.