‍

Cloud workloads rarely operate at a fixed scale. A payment gateway might process 100 transactions per second during normal hours, but on the last day of the month, when payrolls are processed, this can spike to 10,000 transactions per second. If the infrastructure isn’t scaled accordingly, payment requests could face timeouts or failures.

Similarly, a CI/CD pipeline running on Kubernetes might see a few builds per hour during regular development cycles, but when multiple teams merge code before a release, the number of concurrent builds can increase tenfold. If resources aren’t scaled dynamically, build queues pile up, deployments slow down, and developers experience delays.

Manually provisioning resources for such fluctuations isn’t scalable. Over-provisioning leaves expensive compute nodes idle while under-provisioning causes performance bottlenecks. Kubernetes autoscaling solves this by automatically adjusting pods and nodes based on real-time CPU, memory, and custom workload metrics. It ensures applications have the right resources when needed while keeping costs under control.

This blog covers how Kubernetes autoscaling works, the differences between HPA, VPA, and Cluster Autoscaler, and best practices for configuring node pools in EKS, GKE, and AKS.

What is Kubernetes Autoscaling?

Autoscaling in kubernetes is a versatile mechanism with a number of resources, whether pods, nodes, or clusters, that are dynamically adjusted according to the real-time demands of requests from the server. Autoscaling, as we discussed above, optimizes resource utilization on the Kubernetes cluster, which scales up and down based on resource demand, avoiding both under-provisioning and over-provisioning.

However, when the demand spikes, the Kubernetes cluster can send requests to automatically add more pods or scale out our cluster to accommodate this extra workload. Like previously, when demand goes down, it can also scale down, taking away resources from something in the way you’ve decided to use it that would otherwise not be used. Dynamic Scaling, such as Horizontal scaling of pods or Vertical scaling of the cluster, prevents blockages, provides high availability, and decreases the cost of unused resources. Kubernetes constantly readjusts the applied load and adjusts it optimally for application performance.

Key Scenarios Where Autoscaling Is Generally Super Helpful

Handling Traffic Spikes in Web Applications
Autoscaling helps web apps handle changing traffic. It keeps sites responsive during peak times, like Black Friday, and lowers costs during slower periods. By matching resources to demand, it avoids overload and preserves a good user experience.
Managing Unpredictable Workloads in Microservices Architectures
In microservices, workloads differ across services, each with unique scaling needs. Autoscaling allows each service to expand as demand grows, so if requests surge, Kubernetes can scale that service without overloading others.
Optimizing Resource Consumption in Large-Scale Kubernetes Clusters
Resource utilization constantly varies, and large-scale clusters require autoscaling to guarantee companies only provide the needed resources, thus helping reduce resource costs without compromising resource efficiency.

HPA vs. VPA vs. Cluster Autoscaler

Kubernetes offers three main types of autoscalers:

Horizontal Pod Autoscaler (HPA)
Vertical Pod Autoscaler (VPA)
Cluster Autoscaler

Each autoscaler addresses a specific aspect of scaling, such as horizontal scaling of pods for pod count, vertical scaling of pods for resource tuning, and Cluster Autoscaler for infrastructure scaling. Thus, they are complementary tools for optimizing Kubernetes environments. Each serves a different purpose and is suited to specific use cases. Let’s look at how they operate and their core functionality.

Kubernetes Autoscaling Tools and Techniques

In addition to Kubernetes’ native autoscaling mechanisms, various tools and techniques can help with autoscaling capabilities. Let’s review some Kubernetes autoscaling tools that can help optimize cost, performance, and observed resource utilization.

GPUs can also be autoscaled for workloads that require heavy computation, such as AI inference, video processing, or scientific simulations. Kubernetes can dynamically allocate GPU-based nodes based on demand, ensuring efficient resource usage without running expensive GPU instances when not needed.

Karpenter

Karpenter is an open source primarily designed for AWS environments and cluster autoscaling. Thus, it integrates deeply with AWS-specific services such as EC2, IAM, and Auto Scaling Groups to provide computing resources only as needed. Karpenter dynamically provisions compute resources to ensure your Kubernetes workloads are optimized.

Karpenter interacts directly with cloud provider APIs to launch new nodes to support pod resource requests for unscheduled pods, each addressing the pod's specific resource requirements. This capability helps reduce over-provisioning and improves resource utilization. For example, Karpenter can scale your cluster by choosing the best instance type for the workload, whether a spot or an on-demand instance. This flexibility allows users to maximize resource utilization while minimizing costs.

Karpenter is particularly useful for managing spot instances, automatically replacing interrupted nodes with new ones to maintain availability. This helps optimize costs while ensuring workloads are not disrupted by spot instance terminations.

Note: Karpenter does not currently support other cloud providers, such as Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), or even on-premises Kubernetes environments.

For more information, you can check out Karpenter’s official documentation.

To get started with Karpenter on your AWS EKS clusters, add Karpenter Helm Chart:

helm repo add karpenter https://charts.karpenter.sh
helm repo update

Next, install Karpenter using Helm. Make sure to replace <CLUSTER_NAME> and <AWS_REGION> with your cluster-specific details.

helm install karpenter karpenter/karpenter --namespace karpenter --create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=<IAM_ROLE_ARN> \
--set clusterName=<CLUSTER_NAME> \
--set clusterEndpoint=<CLUSTER_ENDPOINT> \
--set aws.defaultInstanceProfile=<INSTANCE_PROFILE>

Ensure you’ve created an IAM role for Karpenter with the necessary permissions described in Karpenter’s getting started documentation.

Create a Provisioner manifest to define how Karpenter should provision nodes on the Kubernetes cluster. The provisioner now supports both AMD64 and ARM64 architectures, allowing the cluster to use ARM-based instances like AWS Graviton for better performance and cost efficiency where applicable.

provisioner.yaml

apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: default spec: limits: resources: cpu: "1000" provider: instanceTypes: ["m5.large", "m5.xlarge"] subnetSelector: karpenter.sh/discovery: <CLUSTER_NAME> securityGroupSelector: karpenter.sh/discovery: <CLUSTER_NAME> requirements: - key: "karpenter.sh/capacity-type" operator: In values: ["spot", "on-demand"] - key: "kubernetes.io/arch" operator: In values: ["amd64", "arm64"] ttlSecondsAfterEmpty: 30

Make sure to replace <CLUSTER_NAME> with the name of your cluster and apply the Manifest Files. If your workloads can run on ARM, Karpenter will now automatically provision ARM64 nodes when needed, optimizing costs and performance.

kubectl apply -f provisioner.yaml

To test if Karpenter is working as intended or not, we will deploy a workload to test the scaling:

karpenter-test.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: karpenter-test spec: replicas: 0 selector: matchLabels: app: karpenter-test template: metadata: labels: app: karpenter-test spec: containers: - name: pause image: public.ecr.aws/eks-distro/kubernetes/pause:3.2

Apply the deployment Manifest File.

kubectl apply -f karpenter-test.yaml

Scale up replicas to test Karpenter’s functionality.

kubectl scale deployment karpenter-test --replicas=5

KEDA

KEDA, which stands for Kubernetes Event-driven Autoscaling, is a lightweight, open-source solution for scaling Kubernetes applications based on external events or metrics. Unlike traditional autoscale, which relies on internal metrics like CPU or memory usage to trigger scaling, KEDA scales based on external events from queues, databases, and messaging platforms.

One common use case for performing kubernetes autoscaling with KEDA is scaling the applications by consuming events from Kafka queues. For example, if a message appears in Kafka, RabbitMQ, or similar messaging queues, KEDA could scale a consumer service to handle it. This event-driven approach suits workloads with fluctuating demand triggered by external events.

To implement KEDA on your local or any form of kubernetes clusters, install KEDA using kubectl:

$ kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.7.1/keda-2.7.1.yaml

Now, for your deployment manifest file, which looks like this:

apiVersion: apps/v1 kind: Deployment metadata: name: keda-queue-test spec: replicas: 1 selector: matchLabels: app: keda-queue-test template: metadata: labels: app: keda-queue-test spec: containers: - name: keda-queue-test image: keda-queue-test:latest ports: - containerPort: 8080

You must create a ScaledObject manifest file to handle the resource utilization using KEDA. Here is an example script for the same:

apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: keda-queue-test-scaledobject namespace: default spec: scaleTargetRef: name: keda-queue-test minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: rabbitmq metadata: queueName: test-queue host: 127.0.0.1:9000 queueLength: '5' authenticationRef: name: rabbitmq-auth

Then, apply these manifest files to your kubernetes cluster:

$ kubectl apply -f keda-test-deployment.yaml
$ kubectl apply -f kesa-test-scaledobject.yaml

This setup will configure your rabbitMQ message broker for autoscaling.

Now, for Amazon SQS, you can define a ScaledObject to scale workloads based on the number of messages in the queue:

apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: keda-sqs-test-scaledobject namespace: default spec: scaleTargetRef: name: keda-sqs-test minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: aws-sqs-queue metadata: queueURL: https://sqs.us-east-1.amazonaws.com/12377839739012/m-queue queueLength: '5' awsRegion: us-east-1 authenticationRef: name: aws-credentials

Apply the SQS autoscaling manifest:

$ kubectl apply -f keda-sqs-scaledobject.yaml

This setup will scale your Kubernetes workloads based on the number of pending messages in Amazon SQS.

Native Autoscaling in GKE

Google Kubernetes Engine (GKE) provides autoscaling at both the pod and cluster levels to match workload demand.

Pod Autoscaling: GKE supports Horizontal Pod Autoscaler (HPA), which scales pods based on CPU, memory, or custom metrics. For more precise resource allocation, Vertical Pod Autoscaler (VPA) can adjust CPU/memory requests dynamically.
Cluster Autoscaling: The Cluster Autoscaler adds or removes nodes based on pending pods, ensuring there’s enough capacity while preventing over-provisioning.
Node Pool Autoscaling: GKE allows autoscaling at the node pool level, enabling different node pools to scale based on workload needs. This helps optimize costs by mixing preemptible and standard VMs for different workloads.

GKE integrates simply with Kubernetes-native monitoring tools like Prometheus and Google Cloud Monitoring, allowing better visibility into autoscaling decisions.

Node Pool Configuration Practices Across Public Clouds

Cloud providers offer different ways to configure node pools in Kubernetes. The right setup helps scale workloads efficiently while keeping costs low. Below is how AWS EKS, GKE, and AKS handle node pools.

Elastic Kubernetes Service

AWS EKS allows different types of node groups for better resource management.

On-Demand Instances – Reliable but costly. Best for important workloads.
Spot Instances – Cheaper but can be shut down anytime. Good for non-critical workloads.
Mixed Instance Groups – Uses both spot and on-demand nodes to balance cost and uptime.
Managed Node Groups – AWS automatically handles updates and scaling.
Self-Managed Node Groups – More control but requires manual setup.

A good strategy is to use spot instances to save costs while keeping a few on-demand instances for stability. For better efficiency, use Managed Node Groups to automate updates and scaling.

Google Kubernetes Engine

GKE provides different node pool types to manage costs and performance.

Standard VMs – Reliable for most workloads.
Preemptible VMs – Low-cost but can be stopped anytime. Good for batch jobs.
Node Auto-Provisioning – GKE adds nodes automatically when needed.
Autoscaling Node Pools – Grows and shrinks based on traffic.
GPU Node Pools – Needed for AI, machine learning, and video processing.

Use preemptible VMs where possible to reduce costs while keeping standard VMs for important workloads. Enabling autoscaling makes sure that the cluster adjusts to demand automatically.

Azure Kubernetes Service

AKS offers flexible node pool options to handle different workloads.

Standard VMs – General-purpose and stable.
Spot Instances – Low-cost but interruptible. Best for temporary workloads.
Mixed Node Pools – Runs different VM types together for cost control.
Custom Metrics Scaling – Adjusts node count based on workload needs.

A mix of spot and standard VMs helps balance cost and availability. Using custom metrics-based scaling ensures resources adjust dynamically based on actual workload requirements.

Now, managing Kubernetes autoscaling can often feel a bit complicated because of the need to define detailed scaling policies and configurations. Traditionally, you have to create YAML files, set resource limits through configuration files, define scaling rules based on CPU, memory usage, or custom metrics, and keep checking and updating these configurations as workloads change. This process takes a lot of time and can lead to mistakes, making it hard to keep things running smoothly and reliably.

Simplifying Autoscaling with Kapstan

Kapstan solves this problem by removing the need for these complex configurations. It is a modern DevSecOps tool designed to make it easier for developers to manage infrastructure. With Kapstan, you can turn on autoscaling with just a few clicks, skipping the complicated setup required by traditional methods. It works smoothly with Kubernetes, giving you a simple interface to manage autoscaling without any hassle. You don’t need to write YAML files or adjust configurations because Kapstan takes care of everything in the background.

Here’s a look at Kapstan's interface, showing how easy it is to set up autoscaling:

You can also set the minimum and maximum number of pods, choose target CPU and memory usage to trigger autoscaling, and create custom triggers to adjust workloads automatically based on demand.

Using this simple interface, you can quickly add services like Cache, Container Service, Cron Job, Object Store, Public Helm Chart, Queue, Serverless Function, and SQL Database.

Kapstan’s approach ensures that the infrastructure you set up is secure and optimized from the start. Whether you’re managing simple apps or complex systems, Kapstan makes the autoscaling process smoother, faster, and more reliable.

For more details, refer to the Kapstan Documentation.

Conclusion

The scalability of work within Kubernetes is complemented by autoscaling. The performance and cost objectives of organizations are achievable in this way. However, there’s a need to emphasize the fact that they need to differentiate between a Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler. Not only that, but Karpenter, KEDA, and even GKE autoscaling help users manage resources more efficiently with resource metrics API. These features offer the possibility of optimizing the performance of Kubernetes clusters in terms of workload and cost integration.

Frequently Asked Questions

Q. What are the key factors to consider when configuring Kubernetes autoscaling?

If you’re autoscaling, consider workload performance, the cost of scaling, and the effects of scaling events on your application. Then, select the Appropriate autoscaling mechanism (HPA, VPA, or Cluster autoscaler).

Q. How does Karpenter differ from traditional Cluster Autoscaler?

Where the traditional Cluster Autoscaler relies on predefined node resources, Karpenter also works based on workload requirements. Scaling is more intelligent because it dynamically chooses the best instance types for each workload.

Q. Can I use KEDA with GKE or AKS?

Yes, KEDA works with both GKE and AKS to scale workloads based on event-driven triggers. The "TFE queue" refers to Terraform Enterprise (TFE), a self-hosted version of Terraform Cloud used for managing infrastructure workflows. Even without TFE, KEDA handles event-driven autoscaling effectively in GKE and AKS.

Q. What are the cost implications of autoscaling across public clouds?

Autoscaling minimizes costs by provisioning only the required resources. However, spot instances and preemptible V can yield savings, particularly for non-critical workloads.

Q. How do I handle autoscaling during traffic spikes in a hybrid cloud environment?

In a hybrid cloud, you can leverage autoscaling tools and cloud-native services to dynamically scale resources across your on-premises and cloud environments without disruptions.

‍

Ankur Khurana

Principal Engineer @ Kapstan. Ankur brings over ten years of expertise in designing and constructing complex systems. He prefers to solve problems by applying first principles and enjoys exploring emerging technologies.

Kubernetes Autoscaling and Node Pool Configuration Across Public Clouds