Services

Trainings

Products

Partnership

Learning Hub

About Us

Select Language

English

Contact

General

Kubernetes Probes: The Operational Safety Net Your Cluster Needs

May 10, 2026

Your cluster is safely handling production traffic today - fine.

But what about tomorrow?

By design, Kubernetes relies on certain operational principles. Although it can manage itself to a large extent, as your applications and infrastructure grow, there are areas where Kubernetes expects explicit signals and control from you.

In this article, we'll examine one of the most critical ones:

Probe Definitions

Probe configuration affects every pod that receives production traffic.

Without probes, two common problems usually occur:

Even if a container freezes internally, Kubernetes won't notice it. The pod will continue to appear as Running.
During rolling updates, the Service starts sending traffic as soon as the container is marked as "started" — even if the application is not actually ready.

This is one of the most common causes of short-lived 5xx spikes after deployments.

So how should liveness, readiness, and startup probes be configured?

Understanding Kubernetes Probes

Kubernetes provides three different probe types, and each answers a different operational question.

1. Liveness Probe — "Is the Container Still Alive?"

If the probe fails:

→ Kubernetes kills the container and restarts it.

Purpose

Applications can become:

Frozen internally
Deadlocked
Stuck in an infinite loop

In these situations, the process may still exist, but the application is no longer responsive.

Kubernetes attempts recovery by restarting the container.

Example

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080

2. Readiness Probe — "Can I Receive Traffic Right Now?"

If the probe fails:

→ Kubernetes does NOT restart the container.

Instead, it removes the pod from the Service endpoint list.

Purpose

Common scenarios include:

Application warm-up still in progress
Cache loading incomplete
Database connectivity issues
Temporary overload conditions

In other words:

"I'm alive, but I shouldn't receive traffic right now."

Example

readinessProbe:
  httpGet:
    path: /ready
    port: 8080

readinessProbe:
  httpGet:
    path: /ready
    port: 8080

readinessProbe:
  httpGet:
    path: /ready
    port: 8080

3. Startup Probe — "Has the Application Finished Starting?"

If the probe fails:

→ Liveness and readiness probes remain disabled.

They only become active after the startup probe succeeds.

Purpose

Startup probes are particularly valuable for applications with long initialization times:

Java / Spring Boot
.NET
Large Python applications

These workloads may require 60–120 seconds before becoming operational.

Without a startup probe, Kubernetes may interpret slow startup as failure and restart the container repeatedly.

Result

CrashLoopBackOff

CrashLoopBackOff

CrashLoopBackOff

Example

startupProbe:
  httpGet:
    path: /healthz
    port: 8080

startupProbe:
  httpGet:
    path: /healthz
    port: 8080

startupProbe:
  httpGet:
    path: /healthz
    port: 8080

Why Use All Three Together?

A healthy pod lifecycle typically looks like this:

[Container Start]
        │
        ├── Startup Phase
        │     startupProbe runs
        │     liveness/readiness inactive
        │
        ├── Startup Success
        │     liveness + readiness enabled
        │
        ├── Runtime Phase
        │     liveness → Am I alive?
        │     readiness → Can I receive traffic?
        │
        └── Pod Termination

[Container Start]
        │
        ├── Startup Phase
        │     startupProbe runs
        │     liveness/readiness inactive
        │
        ├── Startup Success
        │     liveness + readiness enabled
        │
        ├── Runtime Phase
        │     liveness → Am I alive?
        │     readiness → Can I receive traffic?
        │
        └── Pod Termination

[Container Start]
        │
        ├── Startup Phase
        │     startupProbe runs
        │     liveness/readiness inactive
        │
        ├── Startup Success
        │     liveness + readiness enabled
        │
        ├── Runtime Phase
        │     liveness → Am I alive?
        │     readiness → Can I receive traffic?
        │
        └── Pod Termination

Each probe serves a unique purpose.

Using only one or two of them leaves operational gaps.

Probe Mechanisms

Kubernetes supports three different probe methods.

HTTP GET (Recommended)

Expose a health endpoint and return HTTP 200 when healthy.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080

Advantages

Easy to implement
Lightweight
Human-readable
Preferred for most web applications

TCP Socket

Useful when no HTTP endpoint exists.

Kubernetes simply verifies that it can establish a TCP connection.

livenessProbe:
  tcpSocket:
    port: 5672

livenessProbe:
  tcpSocket:
    port: 5672

livenessProbe:
  tcpSocket:
    port: 5672

Common for:

RabbitMQ
Databases
Message brokers

Exec Probe

Executes a command inside the container.

Exit code 0 indicates success.

livenessProbe:
  exec:
    command

livenessProbe:
  exec:
    command

livenessProbe:
  exec:
    command

Important Note

Exec probes launch a new process during every probe interval.

This can become expensive from a CPU perspective.

Whenever possible:

Use HTTP probes
Use TCP probes if HTTP isn't available
Reserve exec probes for special cases

Kubernetes Probe Best Practices

1. Use Different Endpoints for Liveness and Readiness

One of the most common mistakes is using the same endpoint for both probes.

Liveness Endpoint

/healthz

/healthz

/healthz

Question:

Is the process alive?

This should only verify internal application health.

Avoid checking:

Databases
Redis
External APIs

Readiness Endpoint

/ready

/ready

/ready

Question:

Can I handle traffic right now?

This should validate required dependencies.

Examples:

Database connectivity
Cache availability
Queue access

Why It Matters

If the database becomes slow and liveness checks it:

→ Every pod starts restarting.

You create a second outage while already dealing with the first one.

2. Keep Liveness Checks Simple

Liveness is a last-resort recovery mechanism.

Avoid business logic.

A simple response is often sufficient:

@app.get("/healthz")
def healthz():
    return {"status": "ok"}, 200

@app.get("/healthz")
def healthz():
    return {"status": "ok"}, 200

@app.get("/healthz")
def healthz():
    return {"status": "ok"}, 200

3. Make Readiness Reflect Reality

Readiness should answer:

Can this application safely receive production traffic?

Example:

@app.get("/ready")
def ready():
    if not db.is_connected():
        return {"status": "db_down"}, 503

    if not cache.ping():
        return {"status": "cache_down"}, 503

    return {"status": "ready"}, 200

@app.get("/ready")
def ready():
    if not db.is_connected():
        return {"status": "db_down"}, 503

    if not cache.ping():
        return {"status": "cache_down"}, 503

    return {"status": "ready"}, 200

@app.get("/ready")
def ready():
    if not db.is_connected():
        return {"status": "db_down"}, 503

    if not cache.ping():
        return {"status": "cache_down"}, 503

    return {"status": "ready"}, 200

Checks may include:

Database connectivity
Cache availability
Configuration loading
Queue connections

4. Use Startup Probes for Slow-Starting Applications

Ideal candidates:

Spring Boot
.NET Core
Large Python services

Example configuration:

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

# Allows up to 300 seconds startup time.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5
  failureThreshold: 2

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

# Allows up to 300 seconds startup time.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5
  failureThreshold: 2

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

# Allows up to 300 seconds startup time.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5
  failureThreshold: 2

With startup probes in place, initialDelaySeconds often becomes unnecessary.

5. Use Reasonable Probe Parameters

A good starting point:

periodSeconds: 10
timeoutSeconds: 1-3
failureThreshold: 3

periodSeconds: 10
timeoutSeconds: 1-3
failureThreshold: 3

periodSeconds: 10
timeoutSeconds: 1-3
failureThreshold: 3

Recommendation

Make liveness more tolerant than readiness.

Reason:

Readiness failure only stops traffic.
Liveness failure restarts the container.

Container restarts should require stronger evidence.

6. Keep Health Endpoints Authentication-Free

Health endpoints should not require authentication.

Bad:

/healthz → Requires JWT

/healthz → Requires JWT

/healthz → Requires JWT

Good:

/healthz → Internal cluster access only

/healthz → Internal cluster access only

/healthz → Internal cluster access only

Typically, these endpoints are exposed only within the cluster network.

7. Avoid Chaining Other Services' Health Checks

A service should report only its own state.

Bad pattern:

Service A → checks Service B
Service B → checks Service C

Service A → checks Service B
Service B → checks Service C

Service A → checks Service B
Service B → checks Service C

If Service C slows down:

→ B becomes unready

→ A becomes unready

→ Cascading failure spreads through the system

Instead, verify only the dependencies required for your service to function correctly.

The Real Value of Probes Appears During Failures

The absence of probes rarely causes problems when everything is healthy.

Their true value becomes visible during unexpected incidents.

Imagine a pod deadlocks at 3 AM.

With Probes

Kubernetes detects the issue
The pod is restarted automatically
Users never notice

Without Probes

The pod still shows as Running
Requests start returning 5xx errors
The issue remains hidden until someone investigates

Conclusion

Kubernetes probes are not just health checks.

They are operational safeguards that:

Detect application failures
Prevent premature traffic routing
Improve deployment reliability
Enable automatic recovery
Reduce user-facing outages

Think of probes as an insurance policy for your workloads.

You don't remove the fuse box simply because there hasn't been a fire yet.

Join our 250+ customers

Whether you need expert consulting, custom software, or full-scale data solutions, BiSoft is here to help. Let’s talk about how we can support your goals.

Schedule a Free Meeting

Join our 250+ customers

Whether you need expert consulting, custom software, or full-scale data solutions, BiSoft is here to help. Let’s talk about how we can support your goals.

Schedule a Free Meeting

Join our 250+ customers

Whether you need expert consulting, custom software, or full-scale data solutions, BiSoft is here to help. Let’s talk about how we can support your goals.

Schedule a Free Meeting