Our consulting services resulted in a 40% reduction in our client's monthly bill for AWS Elastic Kubernetes Service
Contact UsAbout the Client
This case study delves into how our services played a pivotal role in aiding a client within the travel and
hospitality industry to optimize their AWS EKS (Elastic Kubernetes Service) infrastructure. The outcome was
a noteworthy 40% reduction in their monthly bill. Through strategic measures such as utilizing spot
instances, selecting optimal instance types, implementing failover mechanisms, configuring pod disruption
budgets, enabling effective alerting, incorporating on-demand instances, and utilizing cluster
overprovisioning, we successfully guided our client towards substantial cost savings. Importantly, these
optimizations were executed without sacrificing performance or compromising on system reliability.
Our client, a leading entity in the travel and hospitality industry, heavily depended on their AWS EKS
cluster to sustain critical applications and services. Confronted with challenges related to cost
optimization and the need for high availability, they turned to our expertise for solutions.
Problem Statement
The client encountered significant challenges, primarily centered on the increasing costs linked to operating
their AWS EKS cluster. There was a pressing need to establish robust failover mechanisms and uphold a
heightened level of availability to mitigate service disruptions. Compounding their concerns was the absence
of effective alerting mechanisms for spot instance terminations.
Our Solutions
To tackle the challenges confronting our client, we implemented a comprehensive solution that included the
following key elements:
- Spot Instances and Instance Types: Our solution involved the implementation of AWS spot instances, leveraging their substantial cost savings in comparison to on-demand instances. Through meticulous selection of appropriate instance types tailored to workload requirements, we achieved a balance of cost efficiency and optimal performance.
- Failover Design: Our approach included the implementation of a failover strategy to bolster the cluster's resilience and minimize downtime. By strategically distributing the workload across multiple nodes, we designed a failover mechanism ensuring the presence of at least three pods on distinct nodes at all times. This approach not only provided redundancy but also enhanced fault tolerance, mitigating risks associated with single points of failure.
- Pod Disruption Budget (PDB): In a bid to further augment the availability and stability of the cluster, we implemented Pod Disruption Budgets (PDBs) for all mission-critical deployments. This feature granted us fine-grained control over the number of pods that could be simultaneously disrupted during maintenance or spot instance terminations. Through the enforcement of PDBs, we successfully minimized service disruptions and elevated the overall reliability of the cluster.
- Alerting Mechanisms: In response to the challenge of spot instance terminations, we implemented alerting systems that delivered real-time notifications to designated Slack channels. This proactive alerting mechanism empowered the operations team to promptly analyze alerts regarding the frequency of spot instance terminations. It enabled them to take swift actions, such as adjusting instance types based on historical availability, to effectively manage and mitigate sudden terminations.
- On-Demand Instances: Acknowledging the potential risk of sudden termination associated with spot instances, we implemented a hybrid approach to ensure the uninterrupted operation of critical services. We incorporated a minimum of 40% on-demand instances into the cluster. This hybrid strategy acted as a safety net, guaranteeing a certain capacity to handle workload spikes or any interruptions stemming from spot instance terminations.
- Cluster Over-provisioner: In a bid to further optimize the impact of spot instance terminations, we introduced a Cluster Over-provisioner. This tool, when configured with a proper priority class in Kubernetes, establishes a dummy deployment with configurable capacity, effectively reserving a pool of CPU and memory. In the event of any mission-critical pods experiencing downtime, the Cluster Over-provisioner seamlessly provides the required CPU and memory resources from its reserved pool, ensuring the continued operation of essential services.
Our Solutions
The implementation of our solution yielded several significant benefits for our client, including:
- Cost Reduction: The failover design, coupled with the integration of Pod Disruption Budgets, significantly enhanced availability and resilience. The strategic distribution of three pods across different nodes effectively minimized the risk of service disruptions, ensuring a seamless experience for our client's customers.
- Proactive Spot Instance Termination Handling: The alerting mechanism integrated with Slack empowered the operations team to respond promptly to spot instance terminations. This proactive approach played a crucial role in minimizing downtime and maintaining uninterrupted service availability.
- Resource Optimization: The hybrid approach of integrating on-demand instances alongside spot instances ensured the requisite capacity to manage workload spikes and mitigated the risks linked with spot instance interruptions. The cluster overprovisioner played a pivotal role in optimizing resource utilization, effectively reducing unnecessary costs and maximizing efficiency.
Conclusion
Through the strategic implementation of spot instances, careful selection of instance types, failover design,
Pod Disruption Budgets, alerting mechanisms, hybrid on-demand instances, and cluster over-provisioner, our
services played a pivotal role in facilitating our travel and hospitality client's remarkable achievement of a
40% cost reduction in their monthly AWS EKS bill. The improved availability, reliability, and resource
optimization not only fortified their infrastructure but also allowed them to concentrate on their core
business, leveraging substantial cost savings.
Get In Touch
Thank you for your interest in Ant Tech Company. We welcome inquiries and feedback. Please feel free to reach out to us using the contact details below:
Address
Dallas, TX 75032, USA
Bishkek, Kyrgyzstan
Phone
+1 214 256 3310
+996 995 000 360
info@ant-tech.io