Databases as bottleneck: the usual suspect

When a system gets slow, the database is almost always the first suspect — and often guilty. Databases are centralization points: all requests eventually converge to them.

This article explores the most common database bottlenecks, how to identify them, and strategies to resolve them.

The database is the heart of the system. When it suffers, everything suffers.

Why Databases Are Frequent Bottlenecks

Centralization

While applications scale horizontally easily, traditional databases are harder to scale.

100 application instances
         ↓
    1 database
         ↓
    Bottleneck

Shared state

Unlike stateless applications, databases maintain state that needs to be consistent — this limits parallelism.

I/O intensive

Databases do a lot of disk I/O, which is orders of magnitude slower than memory operations.

Types of Bottlenecks

1. Slow queries

The most common cause. A single poorly written query can bring down the system.

Symptoms:

High latency on specific operations
High database CPU
Long-duration locks

Causes:

Missing indexes
Unused indexes
Full table scans
Inefficient joins
N+1 queries

2. Exhausted connections

Connection pool at limit.

Symptoms:

Connection timeouts
Application "hangs" waiting for connection
Idle connections in database

Causes:

Undersized pool
Slow queries holding connections
Connection leaks
Long transactions

3. Locks and contention

Transactions fighting for the same data.

Symptoms:

Erratic latency
Deadlocks
Aborted transactions

Causes:

Long transactions
Updates on "hot" rows
Missing indexes causing table locks

4. Disk I/O

Disk can't keep up with demand.

Symptoms:

High I/O wait
Slow queries even with good indexes
Buffer pool thrashing

Causes:

Data larger than available memory
Many writes
Undersized disk

5. Replication lag

Replicas not keeping up with primary.

Symptoms:

Growing replication lag
Inconsistent reads
Replicas falling behind

Causes:

Write volume too high
Slow network between primary and replica
Undersized replica

Identifying the Bottleneck

Essential metrics

Metric	What it indicates
Query time (p95, p99)	Slow queries
Connections used/available	Connection pressure
Lock wait time	Contention
Buffer hit ratio	Cache efficiency
Disk I/O wait	Disk pressure
Replication lag	Replica delay

Tools

PostgreSQL:

pg_stat_statements — slowest queries
pg_stat_activity — active connections
pg_locks — active locks
EXPLAIN ANALYZE — execution plan

MySQL:

slow_query_log — slow queries
SHOW PROCESSLIST — active connections
SHOW ENGINE INNODB STATUS — internal state
EXPLAIN — execution plan

The most important query

-- PostgreSQL: Top queries by total time
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

A query's impact = frequency × average time.

Solutions

For slow queries

1. Add indexes

-- Before: full table scan
SELECT * FROM orders WHERE customer_id = 123;

-- After: index scan
CREATE INDEX idx_orders_customer ON orders(customer_id);

2. Rewrite queries

-- Before: N+1
for order in orders:
    customer = query("SELECT * FROM customers WHERE id = ?", order.customer_id)

-- After: JOIN
SELECT o.*, c.*
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.date > '2024-01-01';

3. Denormalize when necessary

Add redundant data to avoid expensive joins.

4. Partition large tables

CREATE TABLE orders (
    id SERIAL,
    created_at TIMESTAMP,
    ...
) PARTITION BY RANGE (created_at);

For connections

1. Size the pool correctly

connections = (cores * 2) + spindles

Where spindles = number of disks.

2. Use connection poolers

PgBouncer (PostgreSQL)
ProxySQL (MySQL)

3. Find and fix leaks

Monitor connections that don't return to pool.

For contention

1. Short transactions

-- Avoid
BEGIN;
-- ... slow operations ...
-- ... processing in code ...
COMMIT;

-- Prefer
-- Process data beforehand
BEGIN;
-- Only database operations, fast
COMMIT;

2. Order accesses

Always access tables/rows in the same order to avoid deadlocks.

3. Use SKIP LOCKED for queues

SELECT * FROM jobs
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED;

For I/O

1. Increase memory (buffer pool)

More data in cache = less I/O.

2. Use SSDs

100x latency difference compared to HDDs.

3. Separate data by temperature

Hot data on fast storage, cold data on cheap storage.

For scale

1. Read replicas

Distribute reads among replicas.

Writes → Primary
Reads → Replicas (load balanced)

2. Sharding

Divide data among multiple databases.

customer_id % 4 = shard_number

3. CQRS

Separate read model from write model.

Best Practices

Development

Always use EXPLAIN before going to production
**Avoid SELECT *** — fetch only necessary columns
Paginate results — LIMIT and OFFSET or cursor
Use prepared statements — execution plan caching
Monitor N+1 — ORM can hide bad queries

Operations

Monitor continuously — queries that were fast can become slow
Keep statistics updated — ANALYZE regularly
Plan maintenance — VACUUM, reindex, etc.
Test with realistic data — production has more data than dev
Have slow query alerts — be proactive

Conclusion

Databases are frequent bottlenecks because:

They centralize data access
They're hard to scale horizontally
They depend on I/O, which is slow

To avoid problems:

Write efficient queries — indexes, optimized joins
Size resources appropriately — connections, memory
Monitor proactively — find problems before users
Plan for scale — replicas, sharding, cache

Treat your database as a precious resource. Every query has a cost.

Why Databases Are Frequent Bottlenecks

Centralization

Shared state

I/O intensive

Types of Bottlenecks

1. Slow queries

2. Exhausted connections

3. Locks and contention

4. Disk I/O

5. Replication lag

Identifying the Bottleneck

Essential metrics

Tools

The most important query

Solutions

For slow queries

For connections

For contention

For I/O

For scale

Best Practices

Development

Operations

Conclusion

Want to understand your platform's limits?