When a system gets slow, the database is almost always the first suspect — and often guilty. Databases are centralization points: all requests eventually converge to them.
This article explores the most common database bottlenecks, how to identify them, and strategies to resolve them.
The database is the heart of the system. When it suffers, everything suffers.
Why Databases Are Frequent Bottlenecks
Centralization
While applications scale horizontally easily, traditional databases are harder to scale.
100 application instances
↓
1 database
↓
Bottleneck
Shared state
Unlike stateless applications, databases maintain state that needs to be consistent — this limits parallelism.
I/O intensive
Databases do a lot of disk I/O, which is orders of magnitude slower than memory operations.
Types of Bottlenecks
1. Slow queries
The most common cause. A single poorly written query can bring down the system.
Symptoms:
- High latency on specific operations
- High database CPU
- Long-duration locks
Causes:
- Missing indexes
- Unused indexes
- Full table scans
- Inefficient joins
- N+1 queries
2. Exhausted connections
Connection pool at limit.
Symptoms:
- Connection timeouts
- Application "hangs" waiting for connection
- Idle connections in database
Causes:
- Undersized pool
- Slow queries holding connections
- Connection leaks
- Long transactions
3. Locks and contention
Transactions fighting for the same data.
Symptoms:
- Erratic latency
- Deadlocks
- Aborted transactions
Causes:
- Long transactions
- Updates on "hot" rows
- Missing indexes causing table locks
4. Disk I/O
Disk can't keep up with demand.
Symptoms:
- High I/O wait
- Slow queries even with good indexes
- Buffer pool thrashing
Causes:
- Data larger than available memory
- Many writes
- Undersized disk
5. Replication lag
Replicas not keeping up with primary.
Symptoms:
- Growing replication lag
- Inconsistent reads
- Replicas falling behind
Causes:
- Write volume too high
- Slow network between primary and replica
- Undersized replica
Identifying the Bottleneck
Essential metrics
| Metric | What it indicates |
|---|---|
| Query time (p95, p99) | Slow queries |
| Connections used/available | Connection pressure |
| Lock wait time | Contention |
| Buffer hit ratio | Cache efficiency |
| Disk I/O wait | Disk pressure |
| Replication lag | Replica delay |
Tools
PostgreSQL:
pg_stat_statements— slowest queriespg_stat_activity— active connectionspg_locks— active locksEXPLAIN ANALYZE— execution plan
MySQL:
slow_query_log— slow queriesSHOW PROCESSLIST— active connectionsSHOW ENGINE INNODB STATUS— internal stateEXPLAIN— execution plan
The most important query
-- PostgreSQL: Top queries by total time
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;
A query's impact = frequency × average time.
Solutions
For slow queries
1. Add indexes
-- Before: full table scan
SELECT * FROM orders WHERE customer_id = 123;
-- After: index scan
CREATE INDEX idx_orders_customer ON orders(customer_id);
2. Rewrite queries
-- Before: N+1
for order in orders:
customer = query("SELECT * FROM customers WHERE id = ?", order.customer_id)
-- After: JOIN
SELECT o.*, c.*
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.date > '2024-01-01';
3. Denormalize when necessary
Add redundant data to avoid expensive joins.
4. Partition large tables
CREATE TABLE orders (
id SERIAL,
created_at TIMESTAMP,
...
) PARTITION BY RANGE (created_at);
For connections
1. Size the pool correctly
connections = (cores * 2) + spindles
Where spindles = number of disks.
2. Use connection poolers
- PgBouncer (PostgreSQL)
- ProxySQL (MySQL)
3. Find and fix leaks
Monitor connections that don't return to pool.
For contention
1. Short transactions
-- Avoid
BEGIN;
-- ... slow operations ...
-- ... processing in code ...
COMMIT;
-- Prefer
-- Process data beforehand
BEGIN;
-- Only database operations, fast
COMMIT;
2. Order accesses
Always access tables/rows in the same order to avoid deadlocks.
3. Use SKIP LOCKED for queues
SELECT * FROM jobs
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED;
For I/O
1. Increase memory (buffer pool)
More data in cache = less I/O.
2. Use SSDs
100x latency difference compared to HDDs.
3. Separate data by temperature
Hot data on fast storage, cold data on cheap storage.
For scale
1. Read replicas
Distribute reads among replicas.
Writes → Primary
Reads → Replicas (load balanced)
2. Sharding
Divide data among multiple databases.
customer_id % 4 = shard_number
3. CQRS
Separate read model from write model.
Best Practices
Development
- Always use EXPLAIN before going to production
- **Avoid SELECT *** — fetch only necessary columns
- Paginate results — LIMIT and OFFSET or cursor
- Use prepared statements — execution plan caching
- Monitor N+1 — ORM can hide bad queries
Operations
- Monitor continuously — queries that were fast can become slow
- Keep statistics updated — ANALYZE regularly
- Plan maintenance — VACUUM, reindex, etc.
- Test with realistic data — production has more data than dev
- Have slow query alerts — be proactive
Conclusion
Databases are frequent bottlenecks because:
- They centralize data access
- They're hard to scale horizontally
- They depend on I/O, which is slow
To avoid problems:
- Write efficient queries — indexes, optimized joins
- Size resources appropriately — connections, memory
- Monitor proactively — find problems before users
- Plan for scale — replicas, sharding, cache
Treat your database as a precious resource. Every query has a cost.