What is Reliability in Cloud Computing?

PUBLISHED
June 7, 2023
READ TIME
10 min

Reliability in cloud computing can mean different things to different people. For some, the reliability of cloud computing is measured by the frequency of component failures or cloud service downtimes; while, for others, cloud computing reliability is measured by cost-efficiency, performance, and security.

When it comes to cloud computing, what does reliability mean? And what is the difference, if any, between reliability and availability?

While the terms reliability and availability in cloud computing are often used interchangeably to express the accessibility of cloud services in a cloud service provider´s data center, there are important distinctions between the two:

  • Reliability in cloud computing is measured by the frequency of component failures.
  • Availability in cloud computing is measured by overall cloud service downtimes.

Component failures and cloud service downtimes occur often, but because they may affect only one service in a particular region (and because there are thousands of services in dozens of regions worldwide), they rarely have a long-lasting impact on business disruption. You can review recent component failures and system downtimes at the following links:

While cloud service providers make every effort to maximize the reliability and availability of cloud computing, many organizations spend more than necessary on cloud services, fail to optimize performance, or neglect security best practices.

Is Cloud Computing Reliably Cost-Efficient?

In most cases, it can be financially beneficial to migrate workloads to the cloud – not only due to the cost shifting from CAPEX to OPEX, but also due to the opportunities to increase productivity, accelerate time to market, lower utility bills, and reduce staff costs. However, migrating workloads to the cloud doesn´t always guarantee cost-efficiency.

Cloud computing can be cost-efficient, but only if you understand that the concept of “you only pay for what you use” isn’t always true. There are many cloud services which charge according to what you provision. For example, if you provision a VM with 8 vCPUs, 32 GiB of RAM, and 200 GiB of storage, that´s what you will be charged for regardless of how much of the VM´s capacity is used.

Users may also fail to switch off resources when they are not being used, fail to decommission resources when they are no longer required, or fail to take advantage of committed use/spend discount programs that increase the cost-efficiency of cloud computing. These lapses all contribute to spend waste. Without putting measures in place to optimize costs, cloud computing is unlikely to be reliably cost-efficient.

Is the Performance of Cloud Computing Reliable?

Other than periodic component failures and cloud service downtimes, the performance of cloud computing as delivered by cloud service providers is generally very reliable. However, in the same way as how the cost-efficiency of cloud computing is dependent on how the cloud is used by organizations, so too is the performance of cloud computing.

Previously we explained how organizations can overspend in the cloud by provisioning more capacity than is required. Performance can also be affected by user actions if a resource is under-provisioned or misconfigured. Additionally, the performance efficiency of cloud computing can be affected by the failure to take advantage of automation.

There is another scenario in which the performance of cloud computing can be unreliable and that is when a “noisy neighbor” in a multi-tenant environment uses more than its fair share of resources, which affects the network performance of other VMs and apps. This is a simple issue to resolve by moving your VM or app to a different Availability Zone or bare metal environment.

The Reliability of Security in Cloud Computing

Without the ability to control the systems on which the data is stored, many organizations perceive that cloud computing is therefore less secure and dismiss adoption. However, cloud data centers are actually more securely than most on-premises systems.

Additionally, most IT security experts agree that the location of data is immaterial (i.e., on-premises or in the cloud); more significant is how access to data is controlled. The majority of cloud security incidents are attributable to misconfigured resources, phishing, unsecured personal devices, and authentication credentials stored in open repositories—hardly the fault of the cloud service provider

The evidence suggests weak links in the reliability of security in cloud computing occur at the user level. As such, it is incumbent on the service user rather than the service provider to secure data. To this point, vulnerability testing and constant monitoring are essential to protect data in the cloud. Ultimately, the cloud is only as secure as users make it.

What happens when the cloud is not reliable or available?

Cloud computing offers many benefits, but these are illusory if your cloud service provider’s data center is hit by an outage and becomes unavailable. Most cloud service providers offer Service Level Agreements in which they will make “commercially reasonable efforts” to deliver 99.9 percent or greater uptime. If they fail to meet their commitment, you receive a “service credit.”

Receiving a service credit for a service you have not received is standard business practice, but when the service you don’t receive takes your business offline, the loss can be far more significant than the value of the credit. Fortunately, service outages are infrequent due to redundancy safeguards implemented by cloud service providers, so availability is mostly reliable, but not guaranteed.

The closest you can get to guaranteed 100 percent uptime is to deploy resources between multiple clouds and/or your on-premises IT infrastructure. In this way, one cloud service provider disruption will not affect your business. This solution can complicate the management of resources, but it is the best way to ensure reliable cloud computing and maximum availability.

How to Improve Reliability in Cloud Computing

Organizations can improve reliability in cloud computing by putting measures in place to optimize costs, monitoring the performance of resources, and implementing security best practices to control access to data and prevent misconfigured resources. While this may sound like a tedious to-do list, most of the ways to improve reliability in cloud computing can be achieved through automation.

Automation enables you to enforce cost, performance, and security policies within predefined guardrails. If an action occurs that breaches the guardrails, the automation solution triggers a function to prevent, remediate, or reverse the action, or alert the resource owner of the policy violation. For example, organizations can apply policies that:

  • Alert business departments to projected budget overspends.
  • Automatically delete unused resources after a predetermined period.
  • Prevent users from deploying unsanctioned or incompatible resources.
  • Continuously verify resource configurations.
  • Enforce global tagging policies for easier cost allocation and cloud governance.
  • Enforce conditional access controls and multi-factor authentication for privileged accounts.
  • Revoke user access if suspicious activity is detected (i.e., logging in from an unrecognized IP address).

Discover for yourself the power of leveraging automation for cloud computing by starting a free trial of Torque, Quali’s Environments as a Service platform for public cloud infrastructure. And to learn more about infrastructure automation, visit www.quali.com/torque.