Greening your K8s Workloads
Greening your K8s Workloads
Energy efficiency is at its best around and above 80% of server utilisation as described by Brendan here1. So how can this be achieved with Kubernetes core functionality, while still being able to handle fluctuating load.
Horizontal Pod Autoscaler and Cluster Autoscaler are your tools of choice when it comes to optimising the resource utilisations of your Kubernetes workloads, but please be mindful as the ideal configuration depends on your specific application workloads, traffic patterns, and infrastructure constraints. However, here’s a breakdown of how to strike a good balance:
Understanding the tools at hand and their roles
- HPA (Horizontal Pod Autoscaler): Excels at scaling pods within existing nodes to handle fluctuations in demand. It reacts quickly to workload changes, ensuring your application remains responsive.
- Cluster Autoscaler: Works to adjust the size of the cluster itself, adding nodes when there isn’t enough capacity for pods to be scheduled and removing nodes when underutilised. It aims to provide the necessary capacity for HPA to function effectively.
How to Find the Balance:
- Prioritise HPA: Configure HPA based on your application’s performance metrics (CPU, memory, custom metrics) with appropriate targets (of around 60-70%) and replica ranges. This ensures your existing nodes are efficiently utilised (roughly 80%).
- Consider Cluster Autoscaler as a Safety Net: If HPA consistently reaches its maximum replica count and pods remain pending, this indicates insufficient cluster capacity. The Cluster Autoscaler then should add nodes to accommodate the additional load that HPA cannot handle alone.
- Node Provisioning Time: Be mindful of the time your infrastructure provider takes to provision new nodes. If it’s significant, the Cluster Autoscaler might need to be more aggressive to prevent pods from being stuck in pending states for too long.
- Predictable Workloads: For predictable traffic patterns, configure the Cluster Autoscaler to preemptively scale up nodes in anticipation of known peaks like daytime or other known events.
Example Configuration:
HPA Configuration (YAML):
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 2 # Minimum number of pods for your application
maxReplicas: 5 # Maximum number of pods for your application
metrics:
- type: Resource
resource:
name: cpu # Metric to scale on (CPU in this case)
target:
type: Utilization
averageUtilization: 70 # Target CPU utilization (below 80%)
Explanation:
- This YAML defines an HPA named “my-hpa” that scales the deployment named “my-deployment”.
- The minimum replica count is set to 2, ensuring at least two pods are always running.
- The maximum replica count is set to 5, allowing the HPA to scale up to 5 pods if needed.
- The HPA scales based on CPU utilisation with a target of 70%, meaning it will scale up when average CPU utilisation across pods exceeds 70%.
Cluster Autoscaler Configuration (YAML, Example for AWS EKS):
apiVersion: clusterautoscaler.x-k8s.io/v1
kind: ClusterAutoscaler
metadata:
name: my-cluster-autoscaler
spec:
# Define scaling based on node utilisation
recommenders:
- type: resourceUtilization
# Scale up if node CPU utilisation exceeds 80%
cpuUtilization: 80
# Define the Kubernetes provider for EKS
clusterProvider: aws
# Define the AWS region
awsRegion: eu-central-2
Explanation:
- This YAML defines a Cluster Autoscaler named “my-cluster-autoscaler” for AWS EKS.
- The Cluster Autoscaler utilises the “resourceUtilization” recommender.
- The “cpuUtilization” is set to 80, indicating the Cluster Autoscaler will recommend adding new nodes when the average CPU utilisation across nodes surpasses 80%.
- The “clusterProvider” and “awsRegion” specify the Kubernetes provider and region for EKS. (Replace these values based on your actual setup)
Continuous Monitoring and Adjustment:
The right balance between HPA and Cluster Autoscaler is an evolving process. Regularly monitor performance metrics, scaling events, and resource utilisation to fine-tune your configurations, ensuring both tools work together efficiently.
Key Points:
- Start with a well-configured HPA as the primary scaling mechanism for immediate response to workload changes.
- Use Cluster Autoscaler as a capacity provider, ensuring there are always enough resources for HPA to function effectively.
- Consider your infrastructure’s node provisioning time when configuring Cluster Autoscaler aggressiveness.
- Balance responsiveness to traffic surges with the goal of cost optimization and efficient resource utilisation.
If you fine tune your HPA and cluster-autoscaler you will always have the resources that are required to run your workloads, but never over-provisioned the resources, which will lower your environmental impact and you optimise your K8s spends.