These three acronyms sound like enterprise jargon, but they solve a real problem: how do you balance shipping fast with keeping things reliable?
The Vocabulary, Simplified
Let's start with definitions that actually make sense:
- SLI (Service Level Indicator) — A number that measures something users care about. "99.2% of requests completed in under 200ms."
- SLO (Service Level Objective) — A target for that number. "We aim for 99.5% of requests under 200ms."
- Error Budget — The gap between perfect (100%) and your target (99.5%). That's 0.5% of requests that can fail without breaking your promise.
The Budget Analogy
Think of reliability like a financial budget.
Your SLO is like saying "I need to save 10% of my income." Your error budget is the 90% you can spend. You're not failing if you spend money — you're failing if you overspend.
In engineering terms: you're not failing if some requests are slow. You're failing if too many requests are slow.
Why This Matters
Before SLOs, teams argued about reliability subjectively. "Is 99.9% good enough?" "Should we delay the feature to fix this bug?" These conversations had no framework.
With SLOs, you have data:
- Error budget remaining? Ship that feature. Take the risk.
- Error budget exhausted? Stop. Focus on reliability work.
This turns subjective debates into objective decisions.
A Real Example
Scenario: Your checkout service has a 99.9% success rate SLO.
Monthly budget: In a month with 1,000,000 checkouts, you can afford 1,000 failures (0.1%).
Current status: It's the 15th and you've used 800 failures.
Decision: You're on track. Safe to deploy that optimization you've been testing.
Same scenario, different status: It's the 15th and you've already used 950 failures.
Burn rate: You're burning budget 2x faster than sustainable.
Decision: Pause feature work. Investigate recent changes. Focus on reliability.
Common Mistakes
Setting SLOs too high. A 99.99% SLO sounds impressive, but it leaves almost no error budget. One bad deploy exhausts your monthly allowance. Teams with aggressive SLOs often can't ship anything without breaking them.
Treating SLOs as SLAs. SLOs are internal targets. SLAs (Service Level Agreements) are contractual promises with penalties. Your SLO should be more ambitious than your SLA — that's your safety margin.
Ignoring the budget. If you never use your error budget, you're either over-investing in reliability or your SLO is too easy. Error budgets are meant to be spent on velocity.
The Bottom Line
SLOs aren't about perfection. They're about making reliability a conscious trade-off. You decide how good is "good enough," then you track whether you're meeting that bar.
The error budget is permission to take risks. Use it wisely.