Cloud elasticity: scaling on demand

One of the biggest promises of the cloud is elasticity: the ability to automatically increase or decrease resources based on demand. In theory, you pay only for what you use and never run out of capacity.

In practice, effective elasticity requires much more than enabling autoscaling. This article explores what elasticity is, how it works, and the precautions needed to truly leverage it.

Elasticity isn't magic. It's well-applied engineering.

What is Elasticity

Elasticity is a system's ability to automatically adapt its resources in response to changes in demand.

Unlike scalability (which is the ability to grow), elasticity also includes the ability to shrink — releasing resources when they're no longer needed.

Elasticity vs Scalability

Concept	Definition
Scalability	Ability to grow to meet more demand
Elasticity	Ability to grow AND shrink dynamically

A system can be scalable without being elastic (grows but doesn't shrink automatically). Elasticity implies scalability, but the reverse isn't true.

How Elasticity Works

Basic components

Monitoring metrics — CPU, memory, requests, latency
Scaling rules — conditions that trigger increase or reduction
Orchestrator — component that adds/removes instances
Load balancer — distributes traffic among instances

Typical flow

Demand increases
    ↓
Metric exceeds threshold (e.g., CPU > 70%)
    ↓
Orchestrator starts new instances
    ↓
Load balancer includes new instances
    ↓
Load is redistributed
    ↓
Metrics normalize

The reverse process happens when demand drops.

Types of Elasticity

Reactive elasticity

Responds to changes after they happen. This is the most common model.

Advantages:

Simple to implement
Based on real metrics

Disadvantages:

Delay between demand and response
May not be fast enough for sudden spikes

Predictive elasticity

Uses historical data and machine learning to anticipate demand changes.

Advantages:

Prepares resources before the peak
Better user experience

Disadvantages:

More complex to implement
Depends on predictable patterns

Scheduled elasticity

Scales based on known schedules (e.g., more resources during business hours).

Advantages:

Predictable and controllable
Doesn't depend on real-time metrics

Disadvantages:

Doesn't respond to unexpected variations
May waste resources

Elasticity Challenges

1. Startup time (cold start)

New instances need time to start. If this time is long, elasticity loses effectiveness.

Mitigations:

Optimize application boot time
Maintain minimum pool of warm instances
Use lightweight containers

2. Application state

Stateful applications are difficult to scale elastically. Sessions, local cache, and in-memory state complicate adding/removing instances.

Mitigations:

Externalize state (Redis, database)
Stateless design
Sticky sessions (with caution)

3. Persistent connections

Databases, queues, and external services have connection limits. Scaling instances can exhaust these limits.

Mitigations:

Connection pooling
Per-instance limits
Managed services with their own scaling

4. Unexpected costs

Poorly configured elasticity can generate very high costs — especially in scaling loop scenarios or attacks.

Mitigations:

Maximum instance limits
Cost alerts
Rate limiting in the application

5. Thrashing (oscillation)

System keeps alternating between scaling up and down rapidly, generating instability.

Mitigations:

Cooldown periods between scaling actions
Thresholds with hysteresis (different values for up and down)
Scale in larger steps

Metrics for Elasticity

Input metrics (when to scale up)

CPU utilization — simple but can be misleading
Request rate — more direct for web applications
Queue depth — excellent for workers
Response time — scales based on user experience
Custom metrics — business-specific

Output metrics (when to scale down)

Generally the same metrics, but with more conservative thresholds to avoid thrashing.

Elasticity in Practice

Example: E-commerce on Black Friday

Normal days: 10 instances
Pre-Black Friday: scheduled scaling to 50 instances
During: reactive elasticity allows reaching 200 instances
Post-peak: gradually returns to 10 instances

Example: B2B SaaS

Night/early morning: 2 instances (minimum)
Business hours: 5-10 instances (reactive elasticity)
End of month (closing): peaks of 15-20 instances

Best Practices

Set maximum limits — protect yourself from uncontrolled costs
Test elasticity — simulate peaks and validate behavior
Monitor scaling time — know how long it takes to react
Use multiple metrics — CPU alone rarely tells the whole story
Plan the minimum — how many instances do you need even without load?
Consider reservations — reserved instances for baseline, on-demand for peaks

Conclusion

Elasticity is a powerful cloud capability, but it's neither automatic nor free. It requires:

Prepared applications (stateless, fast boot)
Careful configuration of thresholds and limits
Continuous monitoring of behavior
Regular testing of scaling scenarios

When well implemented, elasticity transforms fixed costs into variable ones and ensures your system responds to real demand — without waste and without surprises.

The cloud promises elasticity. It's up to you to ensure your system is ready to take advantage of it.

What is Elasticity

Elasticity vs Scalability

How Elasticity Works

Basic components

Typical flow

Types of Elasticity

Reactive elasticity

Predictive elasticity

Scheduled elasticity

Elasticity Challenges

1. Startup time (cold start)

2. Application state

3. Persistent connections

4. Unexpected costs

5. Thrashing (oscillation)

Metrics for Elasticity

Input metrics (when to scale up)

Output metrics (when to scale down)

Elasticity in Practice

Example: E-commerce on Black Friday

Example: B2B SaaS

Best Practices

Conclusion

Want to understand your platform's limits?