Practical and effective cloud governance policies are kind of the holy grail for modern technology and engineering teams.
While everyone would agree that standardizing cloud governance is a priority, the reality is that policies can be difficult to enforce across large teams and disparate repositories of cloud configurations.
Our users approach this challenge through a self-service model. By deploying all AWS and Azure cloud infrastructure from a self-service catalog, you can enforce cloud governance policies regarding how the services deployed from that catalog are configured and how long they operate.
To accomplish this, our users define their cloud governance policies as .rego files in Terraform and import them into Quali Torque directly from their Git repositories. This allows admins to define custom policies based on their infrastructure team’s specific guidelines. We also provide native cloud governance policies based on Open Policy Agent standards that users can choose to enforce.
When anyone attempts to deploy an environment that violates these policies, Torque will deny the action and notify an admin—thereby integrating those standards into day-to-day operations.
Recognizing that different teams will need different infrastructure, we allow admins to apply policies to a specific workspace so it doesn’t disrupt operations for others.
Here we’re going to walk through a few examples that you can start enforcing today.
1. Prohibit oversized AWS and Azure cloud instances
Rightsizing cloud instances—or the practice of ensuring that the services deployed are sized proportionately for the workloads they support—is one of the most effective ways to optimize costs.
Since DevOps and development teams often prioritize performance, they may deploy cloud instances that are much larger and more expensive than they need. At scale, rightsizing these cloud instances can have a significant impact on cloud costs without disrupting performance or operations.
However, even teams with an effective rightsizing strategy can see these measures drift. Whether it’s a result of employee turnover, shifting priorities, or simple misconfigurations, oversized instances can creep back into your deployments and slowly inflate your cloud bill.
To make your rightsizing measures permanent, set a policy to prohibit the VM sizes that your teams do not need to deploy.
In this example, the team will not be able to deploy this size of Azure VM.
With this policy in place, the development team will be able to launch infrastructure via self-service if it is sized correctly.
2. Prohibit unapproved cloud platforms and services
Let’s say you manage a team that only needs to run specific cloud services or platforms, such as AWS or Azure.
If someone were to deploy a service or platform that is not approved, you likely wouldn’t know until you receive the cloud bill. At that point, it’s too late to prevent the deployment without paying for it.
In Quali Torque, you can set policies to allow specific cloud platforms, preventing shadow IT in the form of unapproved platforms.
To get more granular, you can also set rules to allow or prohibit resource types, ensuring the teams deploying infrastructure are not launching any cloud service that your admins don’t know about.
This can be especially valuable when applied to specific workspaces for teams that only need certain cloud infrastructure.
3. Require approval based on expected cloud cost thresholds
A major concern over cloud cost control is the impact on productivity. If the IT, DevOps, or FinOps organization needs to approve every environment that the development teams deploy, they’ll quickly end up with a backlog and cause delays for the projects that rely on them.
This can be especially frustrating for the simple day-to-day cloud deployments that pose little financial risk, but which need to be reviewed as a precaution.
What if you could require approval only for cloud infrastructure that is configured to have a significant impact on budget?
Quali Torque forecasts cloud costs based on the configuration of the services within the environment and the duration of the deployment.
Based on this information, admins can set policies to require approval based on forecasted cloud costs.
To accomplish this, define your cost threshold in a Terraform rego file and import it into the Policies tab of Quali Torque’s Administration space. For example:
- If the env cost is < $1/hour, approve automatically
- If the env cost is > $5/hour, deny automatically
- If the cost is in between, require manual approver/s
With this policy, you can free your team to move faster and deploy cloud instances that are within budget, deny any activity that is vastly over budget, and trigger approval notifications for other requests that are slightly above the expected cost threshold.
The ability to apply unique policies to different teams provides added customization, empowering your organization to move quickly and within guardrails to mitigate the budgetary risk.
4. Terminate all cloud VMs at the end of the workday
Zombie infrastructure—or cloud services that continue to operate long after the workload they were deployed to support—is one of the most common drivers of wasted cloud budget.
And while many organizations attempt to create standards for when cloud services should operate, enforcing those standards consistently is difficult. Quite plainly, efforts to prevent zombie infrastructure will only be as effective as the people responsible for terminating cloud instances.
To automate shutdown for your cloud VMs, Quali customers have a number of options, including:
- Setting a daily schedule for deployment and teardown of AWS and Azure VMs: Especially for teams that only need cloud instances to run during business hours, these schedules can eliminate manual work and prevent zombie infrastructure at scale. In the Administration space of Quali Torque, set a schedule for the specific days of the week and the times of each day when cloud VMs should deploy and shutdown, and the platform will perform those actions based on those schedules. This provides staff the cloud instances they need at the beginning of the day, shuts them down when the day is complete, and provides the option to request an extension via self-service.
- Require a duration for every cloud instance deployed: As part of the self-service deployment method for application environments, Quali Torque requires the user to set a “duration” for that environment from a picklist. After that duration, Torque will shutdown all infrastructure in that environment automatically. Admins can set the maximum runtime that appears in that picklist and manually terminate any environment on-demand.
5. Only allow private cloud storage
Misconfigured cloud storage is among the most common security vulnerabilities, to the point that malicious actors have developed tools to actively seek out and exploit them.
In many cases, this vulnerability stems from a simple misstep in the cloud provisioning process that leaves cloud storage public and easily accessible for a security breach.
To prevent this vulnerability, Quali Torque supports policies to allow only private cloud storage. With support for both AWS S3 and Azure Blob Storage, these policies can deny any deployment containing public storage that can be exploited.
This empowers your teams to identify and prevent a common cloud security vulnerability before hackers can.
6. Only allow certain cloud regions
Any cloud infrastructure deployed in a region that your IT team does not monitor is a financial or security risk. And the longer this unmonitored infrastructure runs, the more it will cost.
To align all cloud deployments to the regions your organization supports, set a rule for Allowed AWS or Azure regions in Quali Torque to deny any deployments outside of your IT organization’s purview.