Our modern society depends heavily on information provided by
computers over the network. Mobile devices amplified that dependency
because people can access the network anytime from anywhere. They must be available
most of the time if you provide such services.
We can mathematically define the availability as the ratio of (A), the
total time a service is capable of being used during a given interval
to (B), the length of the interval. It is usually expressed as a
percentage of uptime in a given year.
Availability % | Downtime per year |
---|---|
99 |
3.65 days |
99.9 |
8.76 hours |
99.99 |
52.56 minutes |
99.999 |
5.26 minutes |
99.9999 |
31.5 seconds |
99.99999 |
3.15 seconds |
There are several ways to increase availability. The most elegant
solution is to rewrite your software so that you can run it on
several hosts at the same time. The software itself needs to have a way
to detect errors and failover. If you only want to serve read-only
web pages, then this is relatively simple. However, this is generally complex
and sometimes impossible because you cannot modify the software. The
following solutions work without modifying the software:
Virtualization environments like Proxmox VE make it much easier to reach
high availability because they remove the “hardware” dependency. They
also support the setup and use of redundant storage and network
devices, so if one host fails, you can simply start those services on
another host within your cluster.
Better still, Proxmox VE provides a software stack called ha-manager,
which can do that automatically for you. It can automatically
detect errors and do automatic failover.
Proxmox VE ha-manager works like an “automated” administrator. First, you
configure what resources (VMs, containers, …) it should
manage. Then, ha-manager observes the correct functionality and handles
service failover to another node in case of errors. ha-manager can
also handle regular user requests ,which may start, stop, relocate and
migrate a service.
But high availability comes at a price. High-quality components are
more expensive, and making them redundant doubles the costs at
least. Additional spare parts increase costs further. So you should
carefully calculate the benefits, and compare with those additional
costs.
|
Increasing availability from 99% to 99.9% is relatively simple. But increasing availability from 99.9999% to 99.99999% is very hard and costly. ha-manager has typical error detection and failover times of about 2 minutes so that you can get no more than 99.999% availability. |
Our modern society depends heavily on information provided by
computers over the network. Mobile devices amplified that dependency
because people can access the network any time from anywhere. If you
provide such services, they must be available
most of the time.
We can mathematically define the availability as the ratio of (A), the
total time a service is capable of being used during a given interval
to (B), the length of the interval. It is normally expressed as a
percentage of uptime in a given year.
Table 1. Availability – Downtime per Year
Availability %
Downtime per year
99
3.65 days
99.9
8.76 hours
99.99
52.56 minutes
99.999
5.26 minutes
99.9999
31.5 seconds
99.99999
3.15 seconds
There are several ways to increase availability. The most elegant
solution is to rewrite your software so that you can run it on
several hosts at the same time. The software itself needs to have a way
to detect errors and do failover. If you only want to serve read-only
web pages, then this is relatively simple. However, this is generally complex
and sometimes impossible because you cannot modify the software yourself. The
following solutions work without modifying the software:
Virtualization environments like Proxmox VE make it much easier to reach
high availability because they remove the “hardware” dependency. They
also support the setup and use of redundant storage and network
devices, so if one host fails, you can simply start those services on
another host within your cluster.
Better still, Proxmox VE provides a software stack called ha-manager,
which can do that automatically for you. It is able to automatically
detect errors and do automatic failover.
Proxmox VE ha-manager works like an “automated” administrator. First, you
configure what resources (VMs, containers, …) it should
manage. Then, ha-manager observes the correct functionality and handles
service failover to another node in case of errors. ha-manager can
also handle normal user requests, which may start, stop, relocate and
migrate a service.
But high availability comes at a price. High-quality components are
more expensive, and making them redundant doubles the costs at
least. Additional spare parts increase costs further. So you should
carefully calculate the benefits, and compare with those additional
costs.
Increasing availability from 99% to 99.9% is relatively
simple. But increasing availability from 99.9999% to 99.99999% is very
hard and costly. ha-manager has typical error detection and failover
times of about 2 minutes, so you can get no more than 99.999%
availability.