Prepare for summer vacation with auto-shutdown for zombie infrastructure 

March 26, 2024
10 min

When your developers and testers take off for vacation this summer, will you know if they left cloud instances running?

As we head further into the summer season, this risk become more common. Zombie infrastructure, or cloud infrastructure that operates long after it’s needed, is one of the most common causes of wasted budget. If your developers and testers leave a staging or testing environment running before they spend a week out of office, you’ll need to pay for it.

So we collected a few recommendations for our customers to prevent paying for infrastructure for teams that are on vacation.

Review active environments and terminate anything that’s unnecessary

If your developers are deploying staging and testing environments via Quali Torque, you can view the status of all deployments in real-time.

This view includes not only the name of the environment, but the owner of the environment and the date it was last accessed.

Here you can see we have an active AWS EC2 instance.

Clicking into that environment provides more information on the environment—including the inputs, outputs, documentation, etc.—so you can explore and evaluate its purpose.

You can also see any collaborators on the environment who may be able to answer any questions you may have.

If the environment isn’t needed—say, if the owner is away on vacation for the week—you can simply terminate with a click of a button.

Set a maximum environment duration to automate teardown

One way to prevent zombie infrastructure at scale is with custom policies for durations. After that duration, Quali Torque will automatically terminate all actively deployed cloud services.

To set a policy, create a rego file in your Git repository that defines your rules for maximum duration. In the example below, our team set a maximum of 600 minutes with an additional rule to require approval for anything to operate longer than 300 minutes.

Once that is defined, Quali Torque will discover it in your Git repository and allow you to import it into the platform.

You can then choose to apply that policy to all environments deployed via Quali Torque, or limit its scope to specific workspaces so it only applies to individual teams or projects. This means you can limit runtimes for workspaces that manage testing and staging environments, while allowing production environments to continue running.

Zombie infrastructure is one of many concerns we see among customers who operate cloud infrastructure without a control layer. Since infrastructure is deployed and managed in Git, the operations and engineering teams have no visibility into what is actively running at any given moment. This prevents your team from addressing cloud cost anomalies before they drive up the cost for your teams.

Acting as the control layer, Quali Torque gives your teams the tools to prevent zombie infrastructure proactively.

To learn more, watch a demo of Quali Torque.