Kubernetes Resource Requests and Limits

When we run applications in Kubernetes, Many Pods share the same Node, competing for CPU and Memory.

If you don’t set Requests and Limits, a single Pod might grab all the CPU and memory it can. This can slow down or even crash other workloads on the same Node.

That’s why Kubernetes lets us define Requests (minimum guarantee) and Limits (maximum cap) so resources are shared fairly and applications run reliably.

1. What Are Resource Requests?

Think of a Request like a minimum guarantee.

When you define a Pod, you can tell Kubernetes:
“Hey, this container needs at least this much CPU and this much Memory to run smoothly.”

Kubernetes uses this information while scheduling. It will only place the Pod on a Node that has enough available resources to satisfy the request.

Example:

resources:
  requests:
    memory: "256Mi"
    cpu: "500m"    # 0.5 CPU core

This means:

Kubernetes will only schedule the Pod on a Node that has at least 0.5 CPU and 256Mi memory free.
Once running, the Pod can use more than this request if resources are available, unless restricted by limits.

2. What Are Resource Limits?

Limits are the maximum cap.

This tells Kubernetes:
“No matter how hungry my container gets, don’t allow it to cross this CPU or Memory boundary.”

Example:

resources:
  limits:
    memory: "512Mi"
    cpu: "1"        # 1 CPU core

This means:

The container can use up to 1 CPU core and 512Mi memory, but not more.
If it tries to exceed CPU, Kubernetes throttles it (slows it down).
If it tries to exceed Memory, Kubernetes kills the container with an OOMKilled error.

3. Requests vs Limits in Action

No requests, no limits → The Pod can be scheduled on any Node, and once it starts, it can consume as much CPU or memory as it wants. This is risky because one Pod can use up all the resources and destabilize the Node.
Only requests set → The Pod is guaranteed those minimum resources at scheduling time. But since no limits are defined, it can take more resources if they’re free on the Node.
Only limits set → The Pod cannot cross the defined maximum usage. However, during scheduling, Kubernetes doesn’t check if enough minimum resources are really available, so it may end up on an overloaded Node.
Both requests and limits set → This is the best practice. The Pod gets guaranteed minimum resources, and at runtime it cannot cross the maximum cap. Balanced and safe.

4. Checking Cluster Nodes

We start by looking at the available nodes in the cluster. Each node has a certain CPU and memory capacity. Kubernetes uses this information to decide where to schedule Pods.

You can then describe a node to see capacity vs allocatable resources:

Capacity = total hardware resources on the node.
Allocatable = resources available for Pods after reserving some for the system.
kubectl describe node showing capacity/allocatable)

1. Node Capacity vs Allocatable

Capacity (total physical resources):

CPU: 2
Memory: 936060 Ki ≈ 914 Mi
Pods: 110

Allocatable (what’s left for Pods after system overhead):

CPU: 2
Memory: 833660 Ki ≈ 814 Mi
Pods: 110
Even though the instance type looks like a “1 GiB node,” Kubernetes doesn’t give all of that to workloads.
Capacity shows the raw node resources.
Allocatable is lower because kubelet, system daemons, and OS processes reserve some memory.
That’s why here you see:
- Total: ~914 Mi
- Usable for Pods: ~814 Mi

2. Node Resource Usage: Non-Terminated Pods, Allocated Resources & Scheduling Impact

Non-terminated Pods section
- Lists all pods currently running on this node.
- Example in your screenshot:
  - kube-proxy → no requests/limits.
  - weave-net → requests 100m CPU (5%), no limits.
Allocated resources
- Summarizes the total requests/limits from all pods on this node.
- Example here:
  - CPU requests = 100m (5%) → reserved for weave-net.
  - Memory requests = 0 → none of these system pods requested memory explicitly.
  - Limits = 0 → none of them defined CPU/memory limits.
Why this matters
- This tells you how much of the node’s allocatable resources are already “booked” by running pods.
- Scheduler uses this info to decide if there’s room for your workload.
- If your new pod requests more than the remaining free allocatable, it goes into Pending.

5. Deploying with Requests Only (No Limits)

We create a deployment with requests but no limits:

Requests only affect scheduling. The scheduler ensures a node has at least this much free capacity before placing the Pod.
Pods can still use more than this request if resources are available because no hard limit is set.
When applied, the Pods are scheduled successfully because the cluster has enough resources.

Apply the YAML

kubectl apply -f reqlimit.yaml

Check the Pods
Next, verify that the Pods are running inside the namespace test-ns

1. Checking Allocated Resources

After deploying the workload with requests defined (but no limits), we can inspect the node again.

What we see:

CPU Requests and Memory Requests now appear under the node’s allocated resources.
Limits remain 0 because none were set in the YAML.

Why this matters:

Requests: These are always tracked by Kubernetes. They tell the scheduler how much CPU and memory are reserved for the pod.
Limits: If not defined, the container can use more resources than requested (as long as the node has capacity). Since limits weren’t set, Kubernetes isn’t enforcing any cap here.
This proves the point: Requests reserve, limits enforce. Without limits, your pod won’t get OOMKilled unless the node itself runs out of resources.

Describing the Node

kubectl describe node ip-172-31-3-215

6. Scaling the Deployment

Next, we scale the deployment up:

kubectl scale deploy javawebappdep -n test-ns --replicas 3

Now Kubernetes tries to place 3 replicas. But our cluster doesn’t have enough free allocatable memory to honor all 256Mi requests.
Result: some Pods go into Pending state.

Describing a Pending Pod shows the reason:

Insufficient memory the scheduler could not find a node that fits the requested resources.

Why this happens

Earlier, kubectl describe node showed Allocatable memory ~833Mi on each worker.
Two running replicas already “booked” 2 × 256Mi = 512Mi on their nodes.
When we asked for a third replica with request = 256Mi, there wasn’t enough free allocatable on any eligible node → the pod remains Pending.
Rule of thumb: Pods are scheduled only if each request (CPU/Memory) can be met by a node's remaining allocatable resources. Requests reserve capacity; if the reservation can't be made, the pod won't start.

Deleting the Deployment & Service

Once testing is complete, you can remove the workload and free up cluster resources:

kubectl delete -f reqlimit.yaml

7. Adding Limits

When we first ran the Deployment with only requests and no limits, the Pod started fine.
That’s because:

Requests only affect scheduling → the scheduler makes sure a node has at least that much free CPU/memory before placing the Pod.
They do not limit usage. Your application can use more than the requested memory if the node has available space.
What happens when we add limits?

Limits are strict caps. If the container tries to use more memory than the limit, the Linux kernel will OOM-kill it (Out Of Memory).
Kubernetes then restarts the Pod. If this repeats, you end up in a CrashLoopBackOff state.

Apply the YAML

Check the pods

1. Why Limits Can Cause CrashLoopBackOff

Limits in Kubernetes are strict caps.
If your container tries to use more memory than its limit, the Linux kernel OOM-kills (Out Of Memory kill) the process.
Kubernetes notices the crash and restarts the Pod.
If this happens repeatedly, the Pod goes into CrashLoopBackOff.
That’s why we sometimes see Pods getting OOMKilled first, and then quickly moving into CrashLoopBackOff.

2. Fixing CrashLoopBackOff by Increasing Limits

In our earlier case, the container was OOMKilled because the memory limit was too low for a Spring Boot app.

The solution: raise the resource limits so the JVM has enough headroom to start and run smoothly.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: javawebappdep
  namespace: test-ns
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: javawebapp
  template:
    metadata:
      name: javawebapp
      labels:
        app: javawebapp
    spec:
      containers:
      - name: javawebapp
        image: kkeducation12345/spring-app:1.0.5
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"

          limits:
            memory: "500Mi"
            cpu: "900m"
---
apiVersion: v1
kind: Service
metadata:
  name: javawebappsvc
  namespace: test-ns
spec:
  type: NodePort
  selector:
    app: javawebapp
  ports:
  - port: 80
    targetPort: 8080

Apply the YAML

Check the pods

Describing the Node

Requests and Limits in Action

In the node description, you can see both Requests and Limits being tracked:

CPU Requests: 500m → reserved for the pod
CPU Limits: 900m → the hard cap
Memory Requests: 256Mi → guaranteed for scheduling
Memory Limits: 500Mi → maximum allowed usage
This proves our pod is now running with both reserved capacity (requests) and strict caps (limits) applied.

Kubernetes Resource Requests and Limits

1. What Are Resource Requests?

2. What Are Resource Limits?

3. Requests vs Limits in Action

4. Checking Cluster Nodes

1. Node Capacity vs Allocatable

2. Node Resource Usage: Non-Terminated Pods, Allocated Resources & Scheduling Impact

5. Deploying with Requests Only (No Limits)

1. Checking Allocated Resources

Describing the Node

6. Scaling the Deployment

Why this happens

Deleting the Deployment & Service

7. Adding Limits

1. Why Limits Can Cause CrashLoopBackOff

2. Fixing CrashLoopBackOff by Increasing Limits

Requests and Limits in Action

Comments (1)

More from this blog

Scaling to a 3-Tier Architecture on AWS with NGINX, React, Node.js & MongoDB

Deploying a 3-Tier Application (React, Node.js, MongoDB) with NGINX Reverse Proxy

Kubernetes Data Persistence Explained Using emptyDir, hostPath, and NFS

Building a Secure DevSecOps CI/CD Pipeline for a Spring Boot Application Using Jenkins, Snyk, SonarQube, Docker, Trivy, and AWS EKS

Horizontal Pod Autoscaler (HPA) in Kubernetes

Command Palette

1. What Are Resource Requests?

2. What Are Resource Limits?

3. Requests vs Limits in Action

4. Checking Cluster Nodes

1. Node Capacity vs Allocatable

2. Node Resource Usage: Non-Terminated Pods, Allocated Resources & Scheduling Impact

5. Deploying with Requests Only (No Limits)

1. Checking Allocated Resources

Describing the Node

6. Scaling the Deployment

Why this happens

Deleting the Deployment & Service

7. Adding Limits

1. Why Limits Can Cause CrashLoopBackOff

2. Fixing CrashLoopBackOff by Increasing Limits

Requests and Limits in Action

Comments (1)

More from this blog