When a system receives more work than it can process, something has to give. Without control, the result is progressive degradation until collapse. Backpressure is the mechanism that propagates overload signals back to the source, enabling flow control.
Backpressure is not failure. It's honest communication about capacity.
The Overload Problem
Without backpressure
Producer: 1000 req/s
Consumer: 500 req/s
Second 1: queue = 500
Second 2: queue = 1000
Second 3: queue = 1500
...
Minute 10: queue = 300,000
→ Memory exhausted
→ Latency explodes
→ System fails
With backpressure
Producer: 1000 req/s
Consumer: 500 req/s
Queue limit: 1000
Second 1: queue = 500
Second 2: queue = 1000 (limit!)
Second 3: producer blocked or rejected
→ Producer slows down
→ System remains stable
→ Latency controlled
Backpressure Strategies
1. Drop
# Drop requests when overloaded
if queue.size() >= MAX_QUEUE:
drop(request)
metrics.increment('dropped_requests')
else:
queue.add(request)
When to use:
- Non-critical data (metrics, logs)
- Real-time streaming systems
- Better to lose some than lose all
When to avoid:
- Financial transactions
- Data that cannot be lost
2. Bounded Buffer
# Block producer when buffer full
queue = BlockingQueue(max_size=1000)
def produce(item):
queue.put(item, timeout=5) # Blocks up to 5s
def consume():
return queue.get()
When to use:
- Temporary load spikes
- Producers that can wait
- Internal service communication
3. Reject
# Reject with error when overloaded
if current_load > MAX_LOAD:
return HttpResponse(status=503,
headers={'Retry-After': '30'})
When to use:
- Public APIs
- When client can retry
- Protection against abuse
4. Throttle
# Rate limiting per client
@rate_limit(100, per='minute')
def api_endpoint(request):
return process(request)
When to use:
- Multi-tenant APIs
- Shared resource protection
- Ensuring fairness between clients
5. Adaptive
# Adjust limits based on metrics
def adaptive_limit():
if latency_p99 > SLO:
decrease_rate_limit()
elif latency_p99 < SLO * 0.5:
increase_rate_limit()
When to use:
- Systems with variable load
- When fixed limits don't work
- Continuous optimization
Implementing Backpressure
In HTTP APIs
# Backpressure middleware
class BackpressureMiddleware:
def __init__(self, max_concurrent=100):
self.semaphore = Semaphore(max_concurrent)
self.current = 0
def process_request(self, request):
if not self.semaphore.acquire(blocking=False):
return HttpResponse(status=503)
self.current += 1
def process_response(self, response):
self.semaphore.release()
self.current -= 1
return response
In Message Queues
# Kafka consumer with backpressure
consumer = KafkaConsumer(
'topic',
max_poll_records=100, # Limit batch
fetch_max_wait_ms=500, # Limit wait
)
# Pause when processing slow
if processing_lag > threshold:
consumer.pause()
elif processing_lag < threshold / 2:
consumer.resume()
In Reactive Streams
// Project Reactor with backpressure
Flux.create(sink -> {
// Producer
while (hasData()) {
sink.next(getData());
}
})
.onBackpressureBuffer(1000) // Buffer of 1000
.onBackpressureDrop(dropped -> { // Drop if exceeds
log.warn("Dropped: {}", dropped);
})
.subscribe(item -> process(item));
In gRPC
// Streaming with flow control
service DataService {
rpc StreamData(StreamRequest) returns (stream DataChunk);
}
// Server controls sending based on acks
func (s *server) StreamData(req *pb.StreamRequest, stream pb.DataService_StreamDataServer) error {
for data := range dataChannel {
if err := stream.Send(data); err != nil {
// Client not keeping up
return err
}
}
return nil
}
Backpressure Patterns
Circuit Breaker
class CircuitBreaker:
def __init__(self, failure_threshold=5, reset_timeout=30):
self.failures = 0
self.state = 'CLOSED'
def call(self, func):
if self.state == 'OPEN':
raise CircuitOpenError()
try:
result = func()
self.failures = 0
return result
except Exception:
self.failures += 1
if self.failures >= self.failure_threshold:
self.state = 'OPEN'
schedule(self.reset, self.reset_timeout)
raise
def reset(self):
self.state = 'HALF-OPEN'
Load Shedding
# Drop low priority requests first
def should_shed(request):
current_load = get_current_load()
if current_load > 90:
# Only critical requests
return request.priority != 'critical'
elif current_load > 70:
# Drop low priority
return request.priority == 'low'
return False
Bulkhead
# Isolate resources by tenant/functionality
pools = {
'critical': ThreadPool(50),
'standard': ThreadPool(30),
'batch': ThreadPool(20),
}
def process(request):
pool = pools.get(request.priority, pools['standard'])
try:
return pool.submit(handle, request, timeout=5)
except ThreadPoolExhausted:
# Backpressure specific to this pool
return HttpResponse(status=503)
Signs of Missing Backpressure
1. Growing latency under load
Load 50%: latency = 100ms
Load 80%: latency = 500ms
Load 100%: latency = 5000ms
→ System accepts everything, performance degrades for all
2. Growing memory
Internal queues growing indefinitely
Unbounded buffers
GC increasingly frequent
3. Failure cascade
Service A overloaded
→ Timeout in service B waiting for A
→ B's queue grows
→ B also fails
→ Cascade continues
Backpressure Metrics
What to monitor
# Rejection rate
rejected_requests_total
dropped_messages_total
# Buffer utilization
queue_size / queue_capacity
buffer_utilization_percent
# Wait time
queue_wait_time_seconds
# Rate limiting
rate_limited_requests_total
throttled_requests_by_client
Alerts
# Backpressure activating frequently
- alert: HighRejectionRate
expr: |
rate(rejected_requests_total[5m])
/ rate(total_requests[5m]) > 0.05
for: 5m
# Queue near capacity
- alert: QueueNearCapacity
expr: |
queue_size / queue_capacity > 0.8
for: 2m
Common Pitfalls
1. Too aggressive backpressure
❌ Reject with queue 10% full
→ Capacity underutilized
✅ Reject when really necessary
→ Use available capacity
2. No feedback to client
❌ return 503 # Client doesn't know when to retry
✅ return 503 with Retry-After: 30
→ Client knows when to try again
3. Backpressure only at the end
❌ API → Queue → Worker → DB
↑
Backpressure only here
(queue already huge)
✅ API → Queue → Worker → DB
↑ ↑ ↑
Backpressure at each stage
Conclusion
Backpressure is essential for stable systems:
- Choose the right strategy for each context
- Implement at each layer, not just the end
- Monitor rejections and buffer utilization
- Give feedback to client about when to retry
- Test behavior under overload
The fundamental rule:
It's better to reject some requests
than to degrade all of them
A system without backpressure is a time bomb waiting for the next spike.
Backpressure is honesty: "I can't right now, try later". Lying about capacity leads to collapse.