Skip to main content

Command Palette

Search for a command to run...

Kubernetes Resource Requests and Limits

Published
8 min read
Kubernetes Resource Requests and Limits

When we run applications in Kubernetes, Many Pods share the same Node, competing for CPU and Memory.

If you don’t set Requests and Limits, a single Pod might grab all the CPU and memory it can. This can slow down or even crash other workloads on the same Node.

That’s why Kubernetes lets us define Requests (minimum guarantee) and Limits (maximum cap) so resources are shared fairly and applications run reliably.

1. What Are Resource Requests?

Think of a Request like a minimum guarantee.

When you define a Pod, you can tell Kubernetes:
“Hey, this container needs at least this much CPU and this much Memory to run smoothly.”

Kubernetes uses this information while scheduling. It will only place the Pod on a Node that has enough available resources to satisfy the request.

Example:

resources:
  requests:
    memory: "256Mi"
    cpu: "500m"    # 0.5 CPU core

This means:

  • Kubernetes will only schedule the Pod on a Node that has at least 0.5 CPU and 256Mi memory free.

  • Once running, the Pod can use more than this request if resources are available, unless restricted by limits.

2. What Are Resource Limits?

Limits are the maximum cap.

This tells Kubernetes:
“No matter how hungry my container gets, don’t allow it to cross this CPU or Memory boundary.”

Example:

resources:
  limits:
    memory: "512Mi"
    cpu: "1"        # 1 CPU core

This means:

  • The container can use up to 1 CPU core and 512Mi memory, but not more.

  • If it tries to exceed CPU, Kubernetes throttles it (slows it down).

  • If it tries to exceed Memory, Kubernetes kills the container with an OOMKilled error.

3. Requests vs Limits in Action

  • No requests, no limits → The Pod can be scheduled on any Node, and once it starts, it can consume as much CPU or memory as it wants. This is risky because one Pod can use up all the resources and destabilize the Node.

  • Only requests set → The Pod is guaranteed those minimum resources at scheduling time. But since no limits are defined, it can take more resources if they’re free on the Node.

  • Only limits set → The Pod cannot cross the defined maximum usage. However, during scheduling, Kubernetes doesn’t check if enough minimum resources are really available, so it may end up on an overloaded Node.

  • Both requests and limits set → This is the best practice. The Pod gets guaranteed minimum resources, and at runtime it cannot cross the maximum cap. Balanced and safe.


4. Checking Cluster Nodes

We start by looking at the available nodes in the cluster. Each node has a certain CPU and memory capacity. Kubernetes uses this information to decide where to schedule Pods.

You can then describe a node to see capacity vs allocatable resources:

  • Capacity = total hardware resources on the node.

  • Allocatable = resources available for Pods after reserving some for the system.

  • kubectl describe node showing capacity/allocatable)

1. Node Capacity vs Allocatable

Capacity (total physical resources):

  • CPU: 2

  • Memory: 936060 Ki ≈ 914 Mi

  • Pods: 110

Allocatable (what’s left for Pods after system overhead):

  • CPU: 2

  • Memory: 833660 Ki ≈ 814 Mi

  • Pods: 110

  • Even though the instance type looks like a “1 GiB node,” Kubernetes doesn’t give all of that to workloads.

  • Capacity shows the raw node resources.

  • Allocatable is lower because kubelet, system daemons, and OS processes reserve some memory.

  • That’s why here you see:

    • Total: ~914 Mi

    • Usable for Pods: ~814 Mi

2. Node Resource Usage: Non-Terminated Pods, Allocated Resources & Scheduling Impact

  1. Non-terminated Pods section

    • Lists all pods currently running on this node.

    • Example in your screenshot:

      • kube-proxy → no requests/limits.

      • weave-net → requests 100m CPU (5%), no limits.

  2. Allocated resources

    • Summarizes the total requests/limits from all pods on this node.

    • Example here:

      • CPU requests = 100m (5%) → reserved for weave-net.

      • Memory requests = 0 → none of these system pods requested memory explicitly.

      • Limits = 0 → none of them defined CPU/memory limits.

  3. Why this matters

    • This tells you how much of the node’s allocatable resources are already “booked” by running pods.

    • Scheduler uses this info to decide if there’s room for your workload.

    • If your new pod requests more than the remaining free allocatable, it goes into Pending.


5. Deploying with Requests Only (No Limits)

We create a deployment with requests but no limits:

  • Requests only affect scheduling. The scheduler ensures a node has at least this much free capacity before placing the Pod.

  • Pods can still use more than this request if resources are available because no hard limit is set.

  • When applied, the Pods are scheduled successfully because the cluster has enough resources.

  • Apply the YAML
kubectl apply -f reqlimit.yaml

  • Check the Pods
    Next, verify that the Pods are running inside the namespace test-ns

1. Checking Allocated Resources

After deploying the workload with requests defined (but no limits), we can inspect the node again.

What we see:

  • CPU Requests and Memory Requests now appear under the node’s allocated resources.

  • Limits remain 0 because none were set in the YAML.

Why this matters:

  • Requests: These are always tracked by Kubernetes. They tell the scheduler how much CPU and memory are reserved for the pod.

  • Limits: If not defined, the container can use more resources than requested (as long as the node has capacity). Since limits weren’t set, Kubernetes isn’t enforcing any cap here.

  • This proves the point: Requests reserve, limits enforce. Without limits, your pod won’t get OOMKilled unless the node itself runs out of resources.

Describing the Node

kubectl describe node ip-172-31-3-215

6. Scaling the Deployment

Next, we scale the deployment up:

kubectl scale deploy javawebappdep -n test-ns --replicas 3

  • Now Kubernetes tries to place 3 replicas. But our cluster doesn’t have enough free allocatable memory to honor all 256Mi requests.

  • Result: some Pods go into Pending state.

  • Describing a Pending Pod shows the reason:

  • Insufficient memory the scheduler could not find a node that fits the requested resources.

Why this happens

  • Earlier, kubectl describe node showed Allocatable memory ~833Mi on each worker.
    Two running replicas already “booked” 2 × 256Mi = 512Mi on their nodes.

  • When we asked for a third replica with request = 256Mi, there wasn’t enough free allocatable on any eligible node → the pod remains Pending.

  • Rule of thumb: Pods are scheduled only if each request (CPU/Memory) can be met by a node's remaining allocatable resources. Requests reserve capacity; if the reservation can't be made, the pod won't start.

Deleting the Deployment & Service

Once testing is complete, you can remove the workload and free up cluster resources:

kubectl delete -f reqlimit.yaml


7. Adding Limits

When we first ran the Deployment with only requests and no limits, the Pod started fine.
That’s because:

  • Requests only affect scheduling → the scheduler makes sure a node has at least that much free CPU/memory before placing the Pod.

  • They do not limit usage. Your application can use more than the requested memory if the node has available space.

  • What happens when we add limits?

  • Limits are strict caps. If the container tries to use more memory than the limit, the Linux kernel will OOM-kill it (Out Of Memory).

  • Kubernetes then restarts the Pod. If this repeats, you end up in a CrashLoopBackOff state.

  • Apply the YAML

  • Check the pods

1. Why Limits Can Cause CrashLoopBackOff

  • Limits in Kubernetes are strict caps.

  • If your container tries to use more memory than its limit, the Linux kernel OOM-kills (Out Of Memory kill) the process.

  • Kubernetes notices the crash and restarts the Pod.

  • If this happens repeatedly, the Pod goes into CrashLoopBackOff.

  • That’s why we sometimes see Pods getting OOMKilled first, and then quickly moving into CrashLoopBackOff.

2. Fixing CrashLoopBackOff by Increasing Limits

In our earlier case, the container was OOMKilled because the memory limit was too low for a Spring Boot app.

The solution: raise the resource limits so the JVM has enough headroom to start and run smoothly.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: javawebappdep
  namespace: test-ns
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: javawebapp
  template:
    metadata:
      name: javawebapp
      labels:
        app: javawebapp
    spec:
      containers:
      - name: javawebapp
        image: kkeducation12345/spring-app:1.0.5
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"

          limits:
            memory: "500Mi"
            cpu: "900m"
---
apiVersion: v1
kind: Service
metadata:
  name: javawebappsvc
  namespace: test-ns
spec:
  type: NodePort
  selector:
    app: javawebapp
  ports:
  - port: 80
    targetPort: 8080
  • Apply the YAML

  • Check the pods

  • Describing the Node

Requests and Limits in Action

In the node description, you can see both Requests and Limits being tracked:

  • CPU Requests: 500m → reserved for the pod

  • CPU Limits: 900m → the hard cap

  • Memory Requests: 256Mi → guaranteed for scheduling

  • Memory Limits: 500Mi → maximum allowed usage

  • This proves our pod is now running with both reserved capacity (requests) and strict caps (limits) applied.


K

Thank you.

1

More from this blog

kkfunda

60 posts