General
Kubernetes Probes: The Operational Safety Net Your Cluster Needs
Your cluster is safely handling production traffic today - fine.
But what about tomorrow?
By design, Kubernetes relies on certain operational principles. Although it can manage itself to a large extent, as your applications and infrastructure grow, there are areas where Kubernetes expects explicit signals and control from you.
In this article, we'll examine one of the most critical ones:
Probe Definitions
Probe configuration affects every pod that receives production traffic.
Without probes, two common problems usually occur:
Even if a container freezes internally, Kubernetes won't notice it. The pod will continue to appear as Running.
During rolling updates, the Service starts sending traffic as soon as the container is marked as "started" — even if the application is not actually ready.
This is one of the most common causes of short-lived 5xx spikes after deployments.
So how should liveness, readiness, and startup probes be configured?
Understanding Kubernetes Probes
Kubernetes provides three different probe types, and each answers a different operational question.
1. Liveness Probe — "Is the Container Still Alive?"
If the probe fails:
→ Kubernetes kills the container and restarts it.
Purpose
Applications can become:
Frozen internally
Deadlocked
Stuck in an infinite loop
In these situations, the process may still exist, but the application is no longer responsive.
Kubernetes attempts recovery by restarting the container.
Example
2. Readiness Probe — "Can I Receive Traffic Right Now?"
If the probe fails:
→ Kubernetes does NOT restart the container.
Instead, it removes the pod from the Service endpoint list.
Purpose
Common scenarios include:
Application warm-up still in progress
Cache loading incomplete
Database connectivity issues
Temporary overload conditions
In other words:
"I'm alive, but I shouldn't receive traffic right now."
Example
3. Startup Probe — "Has the Application Finished Starting?"
If the probe fails:
→ Liveness and readiness probes remain disabled.
They only become active after the startup probe succeeds.
Purpose
Startup probes are particularly valuable for applications with long initialization times:
Java / Spring Boot
.NET
Large Python applications
These workloads may require 60–120 seconds before becoming operational.
Without a startup probe, Kubernetes may interpret slow startup as failure and restart the container repeatedly.
Result
Example
Why Use All Three Together?
A healthy pod lifecycle typically looks like this:
Each probe serves a unique purpose.
Using only one or two of them leaves operational gaps.
Probe Mechanisms
Kubernetes supports three different probe methods.
HTTP GET (Recommended)
Expose a health endpoint and return HTTP 200 when healthy.
Advantages
Easy to implement
Lightweight
Human-readable
Preferred for most web applications
TCP Socket
Useful when no HTTP endpoint exists.
Kubernetes simply verifies that it can establish a TCP connection.
Common for:
RabbitMQ
Databases
Message brokers
Exec Probe
Executes a command inside the container.
Exit code 0 indicates success.
Important Note
Exec probes launch a new process during every probe interval.
This can become expensive from a CPU perspective.
Whenever possible:
Use HTTP probes
Use TCP probes if HTTP isn't available
Reserve exec probes for special cases
Kubernetes Probe Best Practices
1. Use Different Endpoints for Liveness and Readiness
One of the most common mistakes is using the same endpoint for both probes.
Liveness Endpoint
Question:
Is the process alive?
This should only verify internal application health.
Avoid checking:
Databases
Redis
External APIs
Readiness Endpoint
Question:
Can I handle traffic right now?
This should validate required dependencies.
Examples:
Database connectivity
Cache availability
Queue access
Why It Matters
If the database becomes slow and liveness checks it:
→ Every pod starts restarting.
You create a second outage while already dealing with the first one.
2. Keep Liveness Checks Simple
Liveness is a last-resort recovery mechanism.
Avoid business logic.
A simple response is often sufficient:
3. Make Readiness Reflect Reality
Readiness should answer:
Can this application safely receive production traffic?
Example:
Checks may include:
Database connectivity
Cache availability
Configuration loading
Queue connections
4. Use Startup Probes for Slow-Starting Applications
Ideal candidates:
Spring Boot
.NET Core
Large Python services
Example configuration:
With startup probes in place, initialDelaySeconds often becomes unnecessary.
5. Use Reasonable Probe Parameters
A good starting point:
Recommendation
Make liveness more tolerant than readiness.
Reason:
Readiness failure only stops traffic.
Liveness failure restarts the container.
Container restarts should require stronger evidence.
6. Keep Health Endpoints Authentication-Free
Health endpoints should not require authentication.
Bad:
Good:
Typically, these endpoints are exposed only within the cluster network.
7. Avoid Chaining Other Services' Health Checks
A service should report only its own state.
Bad pattern:
If Service C slows down:
→ B becomes unready
→ A becomes unready
→ Cascading failure spreads through the system
Instead, verify only the dependencies required for your service to function correctly.
The Real Value of Probes Appears During Failures
The absence of probes rarely causes problems when everything is healthy.
Their true value becomes visible during unexpected incidents.
Imagine a pod deadlocks at 3 AM.
With Probes
Kubernetes detects the issue
The pod is restarted automatically
Users never notice
Without Probes
The pod still shows as Running
Requests start returning 5xx errors
The issue remains hidden until someone investigates
Conclusion
Kubernetes probes are not just health checks.
They are operational safeguards that:
Detect application failures
Prevent premature traffic routing
Improve deployment reliability
Enable automatic recovery
Reduce user-facing outages
Think of probes as an insurance policy for your workloads.
You don't remove the fuse box simply because there hasn't been a fire yet.






