One of the biggest promises of the cloud is elasticity: the ability to automatically increase or decrease resources based on demand. In theory, you pay only for what you use and never run out of capacity.
In practice, effective elasticity requires much more than enabling autoscaling. This article explores what elasticity is, how it works, and the precautions needed to truly leverage it.
Elasticity isn't magic. It's well-applied engineering.
What is Elasticity
Elasticity is a system's ability to automatically adapt its resources in response to changes in demand.
Unlike scalability (which is the ability to grow), elasticity also includes the ability to shrink — releasing resources when they're no longer needed.
Elasticity vs Scalability
| Concept | Definition |
|---|---|
| Scalability | Ability to grow to meet more demand |
| Elasticity | Ability to grow AND shrink dynamically |
A system can be scalable without being elastic (grows but doesn't shrink automatically). Elasticity implies scalability, but the reverse isn't true.
How Elasticity Works
Basic components
- Monitoring metrics — CPU, memory, requests, latency
- Scaling rules — conditions that trigger increase or reduction
- Orchestrator — component that adds/removes instances
- Load balancer — distributes traffic among instances
Typical flow
Demand increases
↓
Metric exceeds threshold (e.g., CPU > 70%)
↓
Orchestrator starts new instances
↓
Load balancer includes new instances
↓
Load is redistributed
↓
Metrics normalize
The reverse process happens when demand drops.
Types of Elasticity
Reactive elasticity
Responds to changes after they happen. This is the most common model.
Advantages:
- Simple to implement
- Based on real metrics
Disadvantages:
- Delay between demand and response
- May not be fast enough for sudden spikes
Predictive elasticity
Uses historical data and machine learning to anticipate demand changes.
Advantages:
- Prepares resources before the peak
- Better user experience
Disadvantages:
- More complex to implement
- Depends on predictable patterns
Scheduled elasticity
Scales based on known schedules (e.g., more resources during business hours).
Advantages:
- Predictable and controllable
- Doesn't depend on real-time metrics
Disadvantages:
- Doesn't respond to unexpected variations
- May waste resources
Elasticity Challenges
1. Startup time (cold start)
New instances need time to start. If this time is long, elasticity loses effectiveness.
Mitigations:
- Optimize application boot time
- Maintain minimum pool of warm instances
- Use lightweight containers
2. Application state
Stateful applications are difficult to scale elastically. Sessions, local cache, and in-memory state complicate adding/removing instances.
Mitigations:
- Externalize state (Redis, database)
- Stateless design
- Sticky sessions (with caution)
3. Persistent connections
Databases, queues, and external services have connection limits. Scaling instances can exhaust these limits.
Mitigations:
- Connection pooling
- Per-instance limits
- Managed services with their own scaling
4. Unexpected costs
Poorly configured elasticity can generate very high costs — especially in scaling loop scenarios or attacks.
Mitigations:
- Maximum instance limits
- Cost alerts
- Rate limiting in the application
5. Thrashing (oscillation)
System keeps alternating between scaling up and down rapidly, generating instability.
Mitigations:
- Cooldown periods between scaling actions
- Thresholds with hysteresis (different values for up and down)
- Scale in larger steps
Metrics for Elasticity
Input metrics (when to scale up)
- CPU utilization — simple but can be misleading
- Request rate — more direct for web applications
- Queue depth — excellent for workers
- Response time — scales based on user experience
- Custom metrics — business-specific
Output metrics (when to scale down)
Generally the same metrics, but with more conservative thresholds to avoid thrashing.
Elasticity in Practice
Example: E-commerce on Black Friday
Normal days: 10 instances
Pre-Black Friday: scheduled scaling to 50 instances
During: reactive elasticity allows reaching 200 instances
Post-peak: gradually returns to 10 instances
Example: B2B SaaS
Night/early morning: 2 instances (minimum)
Business hours: 5-10 instances (reactive elasticity)
End of month (closing): peaks of 15-20 instances
Best Practices
- Set maximum limits — protect yourself from uncontrolled costs
- Test elasticity — simulate peaks and validate behavior
- Monitor scaling time — know how long it takes to react
- Use multiple metrics — CPU alone rarely tells the whole story
- Plan the minimum — how many instances do you need even without load?
- Consider reservations — reserved instances for baseline, on-demand for peaks
Conclusion
Elasticity is a powerful cloud capability, but it's neither automatic nor free. It requires:
- Prepared applications (stateless, fast boot)
- Careful configuration of thresholds and limits
- Continuous monitoring of behavior
- Regular testing of scaling scenarios
When well implemented, elasticity transforms fixed costs into variable ones and ensures your system responds to real demand — without waste and without surprises.
The cloud promises elasticity. It's up to you to ensure your system is ready to take advantage of it.