While much of cloud cost optimization is often delegated to FinOps teams, DevOps and engineering organizations are ultimately responsible for the cost of cloud infrastructure.
Engineering organizations can take any number of different approaches to cloud cost optimization—from improving observability and setting up alerts for cost anomalies to running workloads on less-expensive instances and optimizing applications.
In this article, we’ll focus on an operational strategy that has proven to optimize cloud costs: automating the operation of ephemeral environments.
What is an ephemeral environment?
Ephemeral environments are short-lived workloads that can be used for non-production use cases. As opposed to production environments and others that support the application, ephemeral environments do not need to run 24/7.
These workloads typically fall into one of two categories:
- Scheduled environments: These workloads typically run only during standard business hours. This approach is optimal for workloads that are needed by many users at any time during the workday, but which are not needed outside of those hours. Commonly used for ad hoc tasks like software development, training, and demos, this approach maintains availability during the workday, eliminates the need to provision environments for every individual use case, and prevents wasted costs overnight, on weekends, and during holidays.
- Just-in-time environments: These workloads can be created as needed and disposed of once the task it supported is accomplished. This is optimal for workloads like software testing environments stages, and helps to prevent wasted costs during the hours when the workload is not needed.
Read the Whitepaper: Unlocking FinOps Value with Automated Ephemeral Environments
Example use cases for ephemeral environments include:
- Software development and testing: Deploying and testing new builds before committing to production
- Software demo environments: Showing functionality to customers
- Training environments: Educating internal or external users on product functionality
Cloud cost optimization benefits of ephemeral environments
Running multiple environments for staging, development, QA, and performance adds a lot of value in SDLC pipelines, but keeping them all on 24/7 may not be always required. Implementing ephemeral environments can have a substantial impact on cloud costs.
For example, one Quali customer identified more than $1.2 million in annual cloud cost savings just by implementing scheduled ephemeral environments.
As a software vendor, this organization deployed a unique production environment for each one of its customers, as well as several non-production environments supporting for each production instance.
Quali Torque’s native inactivity detection functionality found hundreds of environments that were running 24/7 even though they were only used during standard business hours.
After evaluating the conditions of all non-production environments—for example, some were unable to shift away from 24/7 operations due to availability requirements for their customers—the organization identified more than 300 environments that could be converted to ephemeral environments.
The cost savings from this approach would reduce the annual cloud bill by more than 50% and add $1.2 million to the profit margins for the delivery of their software.
Read the Whitepaper: Unlocking FinOps Value with Automated Ephemeral Environments
However, identifying these cost savings opportunities was only half the battle. Implementing this approach would require creating code for these environments and managing the provisioning and termination of them on a daily basis. For engineering teams already tasked with supporting the software delivery lifecycle, this could be significant added bandwidth.
Using Quali Torque, this organization implemented ephemeral environments by:
- Automating the creation of Environment as Code blueprints: By connecting their cloud accounts and repositories to their Torque account, the DevOps team used Torque’s AI agents to create blueprints defining each environment as code that could be deployed quickly and easily. This accelerated the creation of code and eliminated the need to provision and configure individual infrastructure components into environments.
- Setting automated operational schedules for each environment: Using Torque workflows to define when environments would be provisioned and terminated, the organization was able to automate the full lifecycle of each daily workload. This eliminated the need to provision and terminate environments manually, as well as the risk that cloud resources are left running accidentally.
- Providing end users self-service access to live environments: Through integrations with CI/CD platforms and developer tools, as well as Torque’s native self-service catalog, the organization allowed software developers and other engineers to access live environments without submitting tickets or relying on infrastructure experts to provide access. Role-based permissions ensure end users can’t create or modify any environments, while cloud governance policies prevent the deployment of any resource that violates the organization’s standards.
- Continuous visibility and monitoring: Through Torque’s Operation Hub and reporting dashboards, the DevOps team could identify any resource that operated outside of intended hours and intervene rapidly, ensuring any cloud waste was terminated before the costs accrued grew too large.
Through this approach, the organization could convert their non-production environments to ephemeral runtimes without adding overhead to their infrastructure engineering teams.
Conclusion
Costs are involved in every design decision you make. Running idle ephemeral environments in cloud environments is just one area where this comes into play.
Learn more about how Torque supports cloud cost optimization with this demo: