Guide to High Availability Hosting

How many 9s do you need?

What is High Availability?

High Availability is a term used to describe the procedures, infrastructure, and system design to ensure a specified level of accessibility to your server. Accessibility requires both power and network connectivity as well as a functional server. If one or all of these requirements are compromised, it is said to be unavailable. This level of availability is most often specified in a Service Level Agreement (SLA). Usually a set amount of credits is issued if the provider were to fail to meet the agreement. The amount of credits as well as the level of availability can vary from provider to provider. The typical metric used to describe a high availability service is in percentage of availability.

A table showing the amount of downtime allowable based on a typical availability percentage is shown below.

Availability %	Downtime per year	Downtime per month*	Downtime per week
98%	7.30 days	14.4 hours	3.36 hours
99%	3.65 days	7.20 hours	1.68 hours
99.5%	1.83 days	3.60 hours	50.4 minutes
99.8%	17.5 hours	86.2 minutes	20.1 minutes
99.9% (“three nines”)	8.76 hours	43.2 minutes	10.1 minutes
99.99% (“four nines”)	52.6 minutes	4.32 minutes	1.01 minutes
99.999% (“five nines”)	5.26 minutes	25.9 seconds	6.05 seconds
99.9999% (“six nines”)	31.5 seconds	2.59 seconds	0.605 seconds

* Month calculation is based on a standard 30 day calendar month.

As you can see the difference between 99.9% (3 nines) and 99.99% (4 nines) is quite significant. Most business can live with a chance of a 1 minute per week of downtime but when you start to gamble with 10 minutes of downtime every week that may come during peak hours, you could be putting your business in significant financial risk. With each extra 9, you cut your downtime by 10 times the original amount.

What does this mean for your server?

It can mean that not all “high availability” services are equal. The term is used widely for various levels of availability so it is crucial that you ask your provider exactly what percentage you’re paying for, as well as the steps in place to ensure that level of availability is met. Depending on the SLA, you can obtain service credits when a availability agreement is not fulfilled but most companies would much rather have their servers up then get service credits so it is important to ask several questions about their high availability environment before entering into a contract.

What is a High Availability Environment?

A high availability environment is the infrastructure and procedures put into place to ensure a high level of availability. This is usually accomplished by setting up an environment that includes no single points of failure. What does this mean? It means that if one aspect of the architecture were to fail, there is an additional connection in place to be used, and therefore no disruption to the accessibility of the server. It also means that multiple things must go wrong in order for a server to lose availability and therefore greatly decreasing the chances of downtime. Redundant power supply and redundant network connections are a must for certification of a top tier data center and for a high availability configuration. This will ensure that power and network connectivity are provided with a very low chance of interruption.

How does this work?

This is a configuration that Online Tech uses and as you can see, there are a lot of things going on. On the power side, there are two separate, independent power runs from the server to the utility power source and backup generators are in place to deliver power to two separate power supplies on the server. On the network side, two core routers are fed from multiple Internet Service Providers and cross-messed between both routers and network access switches. It also should be noted that it is important that your network connections have multiple entry points to your data center and that each ISP is on a separate fiber to further mitigate the risk of downtime. This is just one “high availability” configuration, but a very good one at ensuring reliable access to both power and internet connectivity.

In addition to power and network redundancy, a high availability environment can be further protected from loss of availability by protecting against server-side failures. This is often referred to as a high availability cluster which can be paired with load balancing as well for a higher performing configuration. This is done very similarly to high available power and network configurations but having a redundant server connected as well. This cluster configuration can recognize a hardware or software fault in the server and failover to the redundant server without an interruption in service. Load balancing, taking advantage of the high availability cluster, can distribute your application’s workload evenly or asymmetrically (if configured that way) between two or more servers to help increase performance.

An example of a simple cluster configuration (a node 2 cluster) can be seen below.

A data center technician can work with you to configure a server environment that will suit your particular needs. This type of configuration is ideal for servers that can’t afford downtime even when it is scheduled maintenance downtime. Preventative maintenance is critical to limiting server-side downtime but unfortunately, downtime is normally required for the maintenance to be performed. A high availability cluster enables your service to be available even during maintenance.

The last issue to tackle when talking about high availability hosting is what happens when a catastrophic disaster (whether natural like a fire, flood, earthquake or tornado or a man-made disasters like human error accidents, burglaries, and even war-related attacks) strikes.

Managing the risk of a disaster in a high availability configuration.

If a disaster were to strike like a massive earthquake and your data center and server(s) were damaged, it wouldn’t really matter if there was a high availability configuration to your server because of the multiple failure points that usually coincide with a major disaster. That is why it is important, as apart of your disaster recovery plan, to at least have your data backed up and in some cases, consider replication services so that your data is continually replicated to an off-site server and can be accessed in the event of a disaster. Connectivity across multiple data centers can add that additional level of availability if a disaster were to strike. On-site as well as off-site replication are options that you should consider when selecting a high availability host. It is important to note that something like disk mirroring and replication services when fully synchronized are not the same thing as disk backup. These services don’t protect against things like accidental deletion or human-error types of data loss. They protect against disk failure or server failure. Setting up regular online or tape backup procedures is an important consideration, in addition to data replication, to protect against disasters.

Conclusion

As you can see, there are many things to consider when choosing a high availability host and depending on your application and your budget there are various levels of protection against downtime that you can choose from. This article was meant to give a brief overview of the topic of high availability hosting and the importance of knowing which types of redundancy are in place for your server. If you are interested in more detailed descriptions of the options available for your high availability configuration see the links below.

See Also

Carrier-Grade: Five Nines, the Myth and the Reality
An examination of the meaning and measurably of availability.
20 tips in 20 minutes: Clustering explained
An explanation of what clustering is and where the future of clustering is headed.
Clustering and High Availability
Microsoft’s Failover and Network Load Balancing Clustering Team Blog.
High Availability Cluster Checklist
The importance in knowing how your solution protects you from the range of possible failure scenarios.