Backpressure: protecting systems from overload

When a system receives more work than it can process, something has to give. Without control, the result is progressive degradation until collapse. Backpressure is the mechanism that propagates overload signals back to the source, enabling flow control.

Backpressure is not failure. It's honest communication about capacity.

The Overload Problem

Without backpressure

Producer: 1000 req/s
Consumer: 500 req/s

Second 1: queue = 500
Second 2: queue = 1000
Second 3: queue = 1500
...
Minute 10: queue = 300,000

→ Memory exhausted
→ Latency explodes
→ System fails

With backpressure

Producer: 1000 req/s
Consumer: 500 req/s
Queue limit: 1000

Second 1: queue = 500
Second 2: queue = 1000 (limit!)
Second 3: producer blocked or rejected

→ Producer slows down
→ System remains stable
→ Latency controlled

Backpressure Strategies

1. Drop

# Drop requests when overloaded
if queue.size() >= MAX_QUEUE:
    drop(request)
    metrics.increment('dropped_requests')
else:
    queue.add(request)

When to use:

Non-critical data (metrics, logs)
Real-time streaming systems
Better to lose some than lose all

When to avoid:

Financial transactions
Data that cannot be lost

2. Bounded Buffer

# Block producer when buffer full
queue = BlockingQueue(max_size=1000)

def produce(item):
    queue.put(item, timeout=5)  # Blocks up to 5s

def consume():
    return queue.get()

When to use:

Temporary load spikes
Producers that can wait
Internal service communication

3. Reject

# Reject with error when overloaded
if current_load > MAX_LOAD:
    return HttpResponse(status=503,
                       headers={'Retry-After': '30'})

When to use:

Public APIs
When client can retry
Protection against abuse

4. Throttle

# Rate limiting per client
@rate_limit(100, per='minute')
def api_endpoint(request):
    return process(request)

When to use:

Multi-tenant APIs
Shared resource protection
Ensuring fairness between clients

5. Adaptive

# Adjust limits based on metrics
def adaptive_limit():
    if latency_p99 > SLO:
        decrease_rate_limit()
    elif latency_p99 < SLO * 0.5:
        increase_rate_limit()

When to use:

Systems with variable load
When fixed limits don't work
Continuous optimization

Implementing Backpressure

In HTTP APIs

# Backpressure middleware
class BackpressureMiddleware:
    def __init__(self, max_concurrent=100):
        self.semaphore = Semaphore(max_concurrent)
        self.current = 0

    def process_request(self, request):
        if not self.semaphore.acquire(blocking=False):
            return HttpResponse(status=503)
        self.current += 1

    def process_response(self, response):
        self.semaphore.release()
        self.current -= 1
        return response

In Message Queues

# Kafka consumer with backpressure
consumer = KafkaConsumer(
    'topic',
    max_poll_records=100,      # Limit batch
    fetch_max_wait_ms=500,     # Limit wait
)

# Pause when processing slow
if processing_lag > threshold:
    consumer.pause()
elif processing_lag < threshold / 2:
    consumer.resume()

In Reactive Streams

// Project Reactor with backpressure
Flux.create(sink -> {
    // Producer
    while (hasData()) {
        sink.next(getData());
    }
})
.onBackpressureBuffer(1000)      // Buffer of 1000
.onBackpressureDrop(dropped -> { // Drop if exceeds
    log.warn("Dropped: {}", dropped);
})
.subscribe(item -> process(item));

In gRPC

// Streaming with flow control
service DataService {
  rpc StreamData(StreamRequest) returns (stream DataChunk);
}

// Server controls sending based on acks
func (s *server) StreamData(req *pb.StreamRequest, stream pb.DataService_StreamDataServer) error {
    for data := range dataChannel {
        if err := stream.Send(data); err != nil {
            // Client not keeping up
            return err
        }
    }
    return nil
}

Backpressure Patterns

Circuit Breaker

class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=30):
        self.failures = 0
        self.state = 'CLOSED'

    def call(self, func):
        if self.state == 'OPEN':
            raise CircuitOpenError()

        try:
            result = func()
            self.failures = 0
            return result
        except Exception:
            self.failures += 1
            if self.failures >= self.failure_threshold:
                self.state = 'OPEN'
                schedule(self.reset, self.reset_timeout)
            raise

    def reset(self):
        self.state = 'HALF-OPEN'

Load Shedding

# Drop low priority requests first
def should_shed(request):
    current_load = get_current_load()

    if current_load > 90:
        # Only critical requests
        return request.priority != 'critical'
    elif current_load > 70:
        # Drop low priority
        return request.priority == 'low'

    return False

Bulkhead

# Isolate resources by tenant/functionality
pools = {
    'critical': ThreadPool(50),
    'standard': ThreadPool(30),
    'batch': ThreadPool(20),
}

def process(request):
    pool = pools.get(request.priority, pools['standard'])
    try:
        return pool.submit(handle, request, timeout=5)
    except ThreadPoolExhausted:
        # Backpressure specific to this pool
        return HttpResponse(status=503)

Signs of Missing Backpressure

1. Growing latency under load

Load 50%:  latency = 100ms
Load 80%:  latency = 500ms
Load 100%: latency = 5000ms

→ System accepts everything, performance degrades for all

2. Growing memory

Internal queues growing indefinitely
Unbounded buffers
GC increasingly frequent

3. Failure cascade

Service A overloaded
→ Timeout in service B waiting for A
→ B's queue grows
→ B also fails
→ Cascade continues

Backpressure Metrics

What to monitor

# Rejection rate
rejected_requests_total
dropped_messages_total

# Buffer utilization
queue_size / queue_capacity
buffer_utilization_percent

# Wait time
queue_wait_time_seconds

# Rate limiting
rate_limited_requests_total
throttled_requests_by_client

Alerts

# Backpressure activating frequently
- alert: HighRejectionRate
  expr: |
    rate(rejected_requests_total[5m])
    / rate(total_requests[5m]) > 0.05
  for: 5m

# Queue near capacity
- alert: QueueNearCapacity
  expr: |
    queue_size / queue_capacity > 0.8
  for: 2m

Common Pitfalls

1. Too aggressive backpressure

❌ Reject with queue 10% full
   → Capacity underutilized

✅ Reject when really necessary
   → Use available capacity

2. No feedback to client

❌ return 503  # Client doesn't know when to retry

✅ return 503 with Retry-After: 30
   → Client knows when to try again

3. Backpressure only at the end

❌ API → Queue → Worker → DB
          ↑
    Backpressure only here
    (queue already huge)

✅ API → Queue → Worker → DB
   ↑       ↑        ↑
   Backpressure at each stage

Conclusion

Backpressure is essential for stable systems:

Choose the right strategy for each context
Implement at each layer, not just the end
Monitor rejections and buffer utilization
Give feedback to client about when to retry
Test behavior under overload

The fundamental rule:

It's better to reject some requests
than to degrade all of them

A system without backpressure is a time bomb waiting for the next spike.

Backpressure is honesty: "I can't right now, try later". Lying about capacity leads to collapse.

The Overload Problem

Without backpressure

With backpressure

Backpressure Strategies

1. Drop

2. Bounded Buffer

3. Reject

4. Throttle

5. Adaptive

Implementing Backpressure

In HTTP APIs

In Message Queues

In Reactive Streams

In gRPC

Backpressure Patterns

Circuit Breaker

Load Shedding

Bulkhead

Signs of Missing Backpressure

1. Growing latency under load

2. Growing memory

3. Failure cascade

Backpressure Metrics

What to monitor

Alerts

Common Pitfalls

1. Too aggressive backpressure

2. No feedback to client

3. Backpressure only at the end

Conclusion

Want to understand your platform's limits?