Performance Regressions: detecting and preventing degradation

A performance regression is when a code change worsens metrics that were previously good. The danger is that many regressions are small — 5% here, 10% there — and go unnoticed until they accumulate into serious problems.

Performance regression is technical debt paid in milliseconds.

Why Regressions Happen

Common causes

1. Unoptimized code
   - N+1 query introduced
   - Unnecessary loop
   - Inefficient serialization

2. Dependency changes
   - Library upgrade
   - New framework version
   - Runtime change

3. Data changes
   - Volume grew
   - Distribution changed
   - New edge cases

4. Configuration changes
   - Pool size altered
   - Timeout changed
   - Cache disabled

The insidious nature

Week 1: latency = 100ms
Week 2: +5% → 105ms (goes unnoticed)
Week 3: +3% → 108ms
Week 4: +7% → 116ms
...
Month 3: latency = 200ms (doubled!)

No individual change was "bad", but the accumulation is critical

Detecting Regressions

1. Performance Baseline

# Establish baseline
baseline = {
    'api_latency_p50': 45,
    'api_latency_p95': 120,
    'api_latency_p99': 250,
    'throughput': 5000,
    'error_rate': 0.001
}

# Detect deviation
def check_regression(current_metrics, baseline, threshold=0.1):
    regressions = []
    for metric, baseline_value in baseline.items():
        current = current_metrics[metric]
        change = (current - baseline_value) / baseline_value

        if change > threshold:
            regressions.append({
                'metric': metric,
                'baseline': baseline_value,
                'current': current,
                'change': f'+{change*100:.1f}%'
            })

    return regressions

2. A/B Comparison

Canary (new version):
  latency_p99: 135ms
  error_rate: 0.12%

Production (current version):
  latency_p99: 120ms
  error_rate: 0.10%

Comparison:
  latency_p99: +12.5% ⚠️
  error_rate: +20% ⚠️

Result: Regression detected, rollback canary

3. Performance Tests in CI

# GitHub Actions
name: Performance Tests

on: [pull_request]

jobs:
  perf-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run Load Test
        run: k6 run tests/load.js --out json=results.json

      - name: Compare with Baseline
        run: |
          python scripts/compare_perf.py \
            --baseline baseline.json \
            --current results.json \
            --threshold 10

      - name: Fail if Regression
        if: ${{ steps.compare.outputs.regression == 'true' }}
        run: |
          echo "Performance regression detected!"
          exit 1

Preventing Regressions

1. Performance Budget

# Define budget
performance_budget:
  api:
    latency_p95: 100ms
    latency_p99: 200ms

  frontend:
    lcp: 2500ms
    fid: 100ms
    cls: 0.1

  database:
    query_time_p95: 50ms

def enforce_budget(metrics, budget):
    violations = []
    for endpoint, limits in budget.items():
        for metric, limit in limits.items():
            if metrics[endpoint][metric] > limit:
                violations.append(f"{endpoint}.{metric}: {metrics[endpoint][metric]} > {limit}")

    if violations:
        raise PerformanceBudgetExceeded(violations)

2. Automated Tests

# pytest-benchmark
import pytest

def test_api_latency(benchmark):
    result = benchmark(lambda: client.get('/api/users'))

    # Performance assertions
    assert benchmark.stats['mean'] < 0.1  # 100ms
    assert benchmark.stats['max'] < 0.5   # 500ms

def test_no_n_plus_one(django_assert_num_queries):
    with django_assert_num_queries(2):
        list(Order.objects.prefetch_related('items').all())

3. Continuous Profiling

# In production with sampling
from pyinstrument import Profiler

@app.middleware("http")
async def profile_requests(request, call_next):
    if random.random() < 0.01:  # 1% of requests
        profiler = Profiler()
        profiler.start()

        response = await call_next(request)

        profiler.stop()
        save_profile(profiler.output_text())

        return response

    return await call_next(request)

4. Focused Code Review

## Performance Review Checklist

### Queries
- [ ] No N+1 queries
- [ ] Adequate indexes for new queries
- [ ] EXPLAIN analyzed for complex queries

### Loops
- [ ] No I/O inside loops
- [ ] Batching where possible
- [ ] O() complexity acceptable

### Memory
- [ ] No obvious leaks
- [ ] Streams for large data
- [ ] Cache with limit

### Dependencies
- [ ] New dependency justified
- [ ] Alternatives benchmarked
- [ ] Bundle size acceptable

Detection Workflow

Complete Pipeline

PR created
    ↓
Unit Tests
    ↓
Build
    ↓
Performance Tests (comparison with baseline)
    ↓
    ├─ No regression → Merge allowed
    │
    └─ Regression detected
           ↓
       Mandatory investigation
           ↓
       ├─ Justified → Update baseline + Merge
       │
       └─ Not justified → Fix required

Production Alerts

# Detect regression comparing with historical baseline
- alert: PerformanceRegression
  expr: |
    (
      avg_over_time(http_request_duration_seconds[1h])
      / avg_over_time(http_request_duration_seconds[24h] offset 7d)
    ) > 1.2
  for: 30m
  annotations:
    summary: "Latency 20% higher than last week"

Fixing Regressions

1. Identify the Cause

# Git bisect to find culprit commit
git bisect start
git bisect bad HEAD
git bisect good v1.2.0

# For each commit, run performance test
git bisect run ./scripts/perf_test.sh

2. Profiling Analysis

# Compare profiles before/after
from pyinstrument import Profiler

# Profile previous version
git checkout HEAD~1
profile_before = run_profiled(workload)

# Profile current version
git checkout HEAD
profile_after = run_profiled(workload)

# Compare
diff = compare_profiles(profile_before, profile_after)
print(diff.get_hotspots())

3. Fix or Revert

# If quick fix possible
if can_fix_quickly():
    fix_regression()
    add_performance_test()  # Prevent recurrence
    deploy()

# If complex fix
else:
    revert_to_previous_version()
    create_ticket_for_fix()
    # Fix with more time and tests

Tools

For CI/CD

k6: Load testing
Lighthouse: Frontend
pytest-benchmark: Python
JMH: Java
BenchmarkDotNet: .NET

For Production

Continuous Profiling:
  - Pyroscope
  - Datadog Continuous Profiler
  - Google Cloud Profiler

APM:
  - Datadog
  - New Relic
  - Dynatrace

For Analysis

Flame Graphs: CPU visualization
Allocation Profilers: Memory
Query Analyzers: Database

Success Metrics

# Healthy program indicators

Detection:
  - 100% of PRs go through perf test
  - Regressions detected before production: > 90%

Prevention:
  - PRs blocked by regression: < 10%
  - Average time to fix: < 2 days

Production:
  - Regressions reaching prod: < 1/month
  - Time to detect in prod: < 1 hour
  - Time to resolve: < 4 hours

Conclusion

Preventing performance regressions requires:

Clear baseline: know what "normal" is
Automated tests: in CI, every PR
Continuous comparison: canary vs production
Fast alerts: detect in minutes, not days
Performance culture: every dev is responsible

The cost of prevention is much less than the cost of fixing:

Prevent (CI test): ~5 min per PR
Detect in staging: ~1 hour of investigation
Detect in production: ~4 hours + user impact
Detect after accumulation: days of refactoring

The best regression is one that never reaches production.