A performance regression is when a code change worsens metrics that were previously good. The danger is that many regressions are small — 5% here, 10% there — and go unnoticed until they accumulate into serious problems.
Performance regression is technical debt paid in milliseconds.
Why Regressions Happen
Common causes
1. Unoptimized code
- N+1 query introduced
- Unnecessary loop
- Inefficient serialization
2. Dependency changes
- Library upgrade
- New framework version
- Runtime change
3. Data changes
- Volume grew
- Distribution changed
- New edge cases
4. Configuration changes
- Pool size altered
- Timeout changed
- Cache disabled
The insidious nature
Week 1: latency = 100ms
Week 2: +5% → 105ms (goes unnoticed)
Week 3: +3% → 108ms
Week 4: +7% → 116ms
...
Month 3: latency = 200ms (doubled!)
No individual change was "bad", but the accumulation is critical
Detecting Regressions
1. Performance Baseline
# Establish baseline
baseline = {
'api_latency_p50': 45,
'api_latency_p95': 120,
'api_latency_p99': 250,
'throughput': 5000,
'error_rate': 0.001
}
# Detect deviation
def check_regression(current_metrics, baseline, threshold=0.1):
regressions = []
for metric, baseline_value in baseline.items():
current = current_metrics[metric]
change = (current - baseline_value) / baseline_value
if change > threshold:
regressions.append({
'metric': metric,
'baseline': baseline_value,
'current': current,
'change': f'+{change*100:.1f}%'
})
return regressions
2. A/B Comparison
Canary (new version):
latency_p99: 135ms
error_rate: 0.12%
Production (current version):
latency_p99: 120ms
error_rate: 0.10%
Comparison:
latency_p99: +12.5% ⚠️
error_rate: +20% ⚠️
Result: Regression detected, rollback canary
3. Performance Tests in CI
# GitHub Actions
name: Performance Tests
on: [pull_request]
jobs:
perf-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Load Test
run: k6 run tests/load.js --out json=results.json
- name: Compare with Baseline
run: |
python scripts/compare_perf.py \
--baseline baseline.json \
--current results.json \
--threshold 10
- name: Fail if Regression
if: ${{ steps.compare.outputs.regression == 'true' }}
run: |
echo "Performance regression detected!"
exit 1
Preventing Regressions
1. Performance Budget
# Define budget
performance_budget:
api:
latency_p95: 100ms
latency_p99: 200ms
frontend:
lcp: 2500ms
fid: 100ms
cls: 0.1
database:
query_time_p95: 50ms
def enforce_budget(metrics, budget):
violations = []
for endpoint, limits in budget.items():
for metric, limit in limits.items():
if metrics[endpoint][metric] > limit:
violations.append(f"{endpoint}.{metric}: {metrics[endpoint][metric]} > {limit}")
if violations:
raise PerformanceBudgetExceeded(violations)
2. Automated Tests
# pytest-benchmark
import pytest
def test_api_latency(benchmark):
result = benchmark(lambda: client.get('/api/users'))
# Performance assertions
assert benchmark.stats['mean'] < 0.1 # 100ms
assert benchmark.stats['max'] < 0.5 # 500ms
def test_no_n_plus_one(django_assert_num_queries):
with django_assert_num_queries(2):
list(Order.objects.prefetch_related('items').all())
3. Continuous Profiling
# In production with sampling
from pyinstrument import Profiler
@app.middleware("http")
async def profile_requests(request, call_next):
if random.random() < 0.01: # 1% of requests
profiler = Profiler()
profiler.start()
response = await call_next(request)
profiler.stop()
save_profile(profiler.output_text())
return response
return await call_next(request)
4. Focused Code Review
## Performance Review Checklist
### Queries
- [ ] No N+1 queries
- [ ] Adequate indexes for new queries
- [ ] EXPLAIN analyzed for complex queries
### Loops
- [ ] No I/O inside loops
- [ ] Batching where possible
- [ ] O() complexity acceptable
### Memory
- [ ] No obvious leaks
- [ ] Streams for large data
- [ ] Cache with limit
### Dependencies
- [ ] New dependency justified
- [ ] Alternatives benchmarked
- [ ] Bundle size acceptable
Detection Workflow
Complete Pipeline
PR created
↓
Unit Tests
↓
Build
↓
Performance Tests (comparison with baseline)
↓
├─ No regression → Merge allowed
│
└─ Regression detected
↓
Mandatory investigation
↓
├─ Justified → Update baseline + Merge
│
└─ Not justified → Fix required
Production Alerts
# Detect regression comparing with historical baseline
- alert: PerformanceRegression
expr: |
(
avg_over_time(http_request_duration_seconds[1h])
/ avg_over_time(http_request_duration_seconds[24h] offset 7d)
) > 1.2
for: 30m
annotations:
summary: "Latency 20% higher than last week"
Fixing Regressions
1. Identify the Cause
# Git bisect to find culprit commit
git bisect start
git bisect bad HEAD
git bisect good v1.2.0
# For each commit, run performance test
git bisect run ./scripts/perf_test.sh
2. Profiling Analysis
# Compare profiles before/after
from pyinstrument import Profiler
# Profile previous version
git checkout HEAD~1
profile_before = run_profiled(workload)
# Profile current version
git checkout HEAD
profile_after = run_profiled(workload)
# Compare
diff = compare_profiles(profile_before, profile_after)
print(diff.get_hotspots())
3. Fix or Revert
# If quick fix possible
if can_fix_quickly():
fix_regression()
add_performance_test() # Prevent recurrence
deploy()
# If complex fix
else:
revert_to_previous_version()
create_ticket_for_fix()
# Fix with more time and tests
Tools
For CI/CD
k6: Load testing
Lighthouse: Frontend
pytest-benchmark: Python
JMH: Java
BenchmarkDotNet: .NET
For Production
Continuous Profiling:
- Pyroscope
- Datadog Continuous Profiler
- Google Cloud Profiler
APM:
- Datadog
- New Relic
- Dynatrace
For Analysis
Flame Graphs: CPU visualization
Allocation Profilers: Memory
Query Analyzers: Database
Success Metrics
# Healthy program indicators
Detection:
- 100% of PRs go through perf test
- Regressions detected before production: > 90%
Prevention:
- PRs blocked by regression: < 10%
- Average time to fix: < 2 days
Production:
- Regressions reaching prod: < 1/month
- Time to detect in prod: < 1 hour
- Time to resolve: < 4 hours
Conclusion
Preventing performance regressions requires:
- Clear baseline: know what "normal" is
- Automated tests: in CI, every PR
- Continuous comparison: canary vs production
- Fast alerts: detect in minutes, not days
- Performance culture: every dev is responsible
The cost of prevention is much less than the cost of fixing:
Prevent (CI test): ~5 min per PR
Detect in staging: ~1 hour of investigation
Detect in production: ~4 hours + user impact
Detect after accumulation: days of refactoring
The best regression is one that never reaches production.