Automating detection of inactive cloud resources that inflate costs

PUBLISHED
December 8, 2023
READ TIME
10 min

We often tell our users that any cloud resource your team deploys will show up on the bill.

Respondents to Flexera’s 2021 State of Cloud Report estimate that as much as 30% of cloud spend is waste. Inactive cloud resources are one of the most common sources of this waste.

This challenge has long plagued engineering and FinOps teams looking to optimize cloud costs. Developers, engineers, and other staff face a long list of priorities. Naturally, terminating cloud resources after they’re no longer needed may not always make the top of the list.

To help our users address this challenge, we recently released new functionality that detects when cloud resources deployed are inactive, calculates the potential cost savings from terminating those resources, and notifies the teams responsible for that infrastructure.

Step 1. Leverage your IaC to automate orchestration and deployment of cloud resources

To establish real-time visibility into cloud activity, you first need to deploy your resources via Quali Torque.

This starts by connecting your Git repositories to a Quali Torque account. The platform will automatically discover the Infrastructure as Code (IaC) and other configuration resources managed in those repositories and automatically generate new YAML files leveraging those configurations.

In Torque, that YAML is referred to as a blueprint, and it can be used to define the deployment plan for a complete environment. For example, a staging environment for a web app may require multiple cloud resources configured together. A Torque blueprint leverages the configurations in Git to define all the resources needed for that environment, including the dependencies between those and the outputs of the deployment—in this case, the staging environment itself.

From there, you can release one-click access for your teams to deploy these environments. In the example of the staging environment, you can integrate with your team’s CI/CD tool to provide easy access to run that staging environment directly as needed.

This helps to eliminate redundant ticket-submission processes for the developers and other staff who need cloud environments, while also reducing provisioning times and manual work on the DevOps and IT teams responsible for delivering those environments.

Step 2. Track cloud costs based on deployment by team, user, or function

Using the Torque platform to initiate the creation of your cloud resources allows you to monitor costs based on those deployments—including the context showing who is responsible them.

Since Torque leverages the cloud resource configurations and can be used to control the duration of those deployments, the platform can calculate how much those resources will cost when the cloud bill arrives.

The platform also knows which user was responsible for each deployment, providing a developer-first view that allows engineering teams to look at the teams and projects responsible for using budget before working back to the individual resource configurations they’ve deployed.

This view into cloud costs by day-to-day operations allows engineering teams to eradicate waste as it occurs, and before receiving the cloud bill.

Step 3. Review inactive cloud resources based on potential cost

Even with this approach, some of our customers have found other opportunities for cost savings.

Put simply, how do you know that all the cloud resources deployed—and the costs of running them—are justified? What if you could find those that can be terminated without disrupting any of your teams’ work?

To answer this question, we implemented a machine-learning engine in Torque that automatically reviews all actively deployed cloud resources for signs of utilization.

Low utilization indicates that a cloud resource was deployed but is not being used—and is therefore wasting budget.

In this example, you can see how long cloud resources were being used appropriately (active), terminated (powered-off), and wasteful (idle).

Based on the resource configuration and the expected duration of the cloud environment via Torque, the platform calculates the cost savings from terminating this cloud resource. All recommendations are then listed individually in another report in the Inactivity dashboard, allowing engineering leaders to identify their biggest cost drivers and explore more context before taking action.

Step 4. Set rules to prevent waste going forward

Once you’ve identified cloud environments that are running unnecessarily, Torque allows anyone with administrator permissions to terminate those resources with a single click.

But how do you make sure that source of waste doesn’t come back?

Admins can set policies and workflows in Torque to ensure that all cloud resources deployed. If, for example, your biggest sources of waste occur outside of usual business hours, you can use Torque automation to deny activity during those hours.

First, a workflow—which instructs Torque to automate actions on cloud resources—can automate the deployment of cloud resources for a specific team at the beginning of the workday, then automate the termination of those cloud resources at the end of the workday. This eliminates the risk that anyone on that team leaves a cloud environment running overnight or through a weekend, when they’re not using it for work.

Additionally, policies—which instruct Torque to allow or prohibit specific activity—can require approval from an administrator if anyone attempts to deploy a new cloud environment outside of working hours.

Through this approach, you can allow your teams to access the cloud resources they need, but only as long as they’re in line with your cost guardrails.

Step 5. Track cost savings based on your measures

To understand the impact of your efforts to reduce waste, review the Realized Savings report in Torque.

This view includes total cloud cost savings across all users, while also providing visibility into day-to-day progress and individual measures based on impact.

While eliminating inactivity is not the only way to optimize cloud costs, it is one of the most common sources of waste.

Establishing a control layer for your cloud infrastructure allows engineering teams to identify and act on opportunities to eliminate this waste without disrupting operations for their teams.

Learn more about cloud cost optimization with Quali Torque.

Additional Resources

Watch a brief demo of Quali Torque to learn more