Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics. This allows your applications to handle varying loads efficiently.

How HPA Works

HPA continuously monitors the metrics of pods in the target workload
It calculates the desired number of replicas based on the metrics and target values
The controller adjusts the replica count to match demand
Scale-up and scale-down happen automatically within configured bounds

Metrics Types

Metric Type	Description	Example
Resource	CPU or memory utilization	`cpu`, `memory`
Pods	Custom metrics from pods	`requests_per_second`
Object	Metrics from other Kubernetes objects	`queue_length` from a Service
External	Metrics from outside the cluster	Cloud provider metrics

Resources

OpenShiftKubernetes

Horizontal Pod Autoscaler

Custom Metrics Autoscaler

Horizontal Pod Autoscaling

HPA Walkthrough

Prerequisites

For HPA to work with CPU and memory metrics, you need:

Metrics Server installed in your cluster (provides resource metrics)
Resource requests defined on your containers (HPA uses these as the baseline)

References

Basic HPA based on CPU

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

HPA with CPU and Memory

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 75
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

HPA with Scaling Behavior

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: controlled-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60  # Scale down max 10% per minute
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15  # Can double every 15 seconds
        - type: Pods
          value: 4
          periodSeconds: 15  # Or add 4 pods every 15 seconds
      selectPolicy: Max

Deployment with Resource Requests (required for HPA)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: web
          image: nginx
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          ports:
            - containerPort: 80

OpenShiftKubernetes

Create HPA Imperatively

oc autoscale deployment web-app --min=2 --max=10 --cpu-percent=70

Get HPAs

oc get hpa

Describe HPA

oc describe hpa web-app-hpa

Watch HPA Status

oc get hpa -w

Delete HPA

oc delete hpa web-app-hpa

Check Metrics Server

oc top pods

Create HPA Imperatively

kubectl autoscale deployment web-app --min=2 --max=10 --cpu-percent=70

Get HPAs

kubectl get hpa

Describe HPA

kubectl describe hpa web-app-hpa

Watch HPA Status

kubectl get hpa -w

Delete HPA

kubectl delete hpa web-app-hpa

Check Metrics Server

kubectl top pods

Understanding HPA Output

When you run kubectl get hpa, you'll see output like:

NAME          REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
web-app-hpa   Deployment/web-app    45%/70%   2         10        3          5m

Column	Description
TARGETS	Current/Target utilization (45% current, 70% target)
MINPODS	Minimum replicas
MAXPODS	Maximum replicas
REPLICAS	Current number of pods

Best Practices

Set Resource Requests - Always define CPU/memory requests on containers
Start Conservative - Begin with higher target utilization and adjust
Use Stabilization Windows - Prevent thrashing with scale-down delays
Monitor Behavior - Watch HPA decisions and adjust thresholds
Consider Pod Disruption Budgets - Ensure availability during scale-down
Test Under Load - Validate HPA behavior before production deployment

Troubleshooting

Issue	Possible Cause	Solution
`<unknown>/70%` in TARGETS	Metrics server not running	Install metrics server
Not scaling up	Resource requests not set	Add requests to containers
Scaling too aggressively	Default behavior too fast	Add scaling policies
Not scaling down	Stabilization window	Wait for window to pass