Methodology9 min

Workload Characterization: knowing your workload

Before testing performance, you need to understand your load. Learn to characterize workloads for more realistic tests and better decisions.

One of the most common mistakes in performance testing is testing with load that doesn't represent reality. The result? Tests that "pass" but don't predict actual system behavior.

Workload characterization is the process of understanding, documenting, and modeling your system's real workload. It's the foundation of any meaningful performance test.

You can't test what you don't understand. Characterization comes before simulation.

What is Workload

Workload is the total demand imposed on the system, including:

  • Volume: number of requests, transactions, users
  • Mix: proportion between different types of operations
  • Temporal pattern: how load varies over time
  • Data: size and distribution of processed data
  • Behavior: sequence of user actions

Why Characterization is Critical

Tests without characterization

"Let's run 10,000 requests per second"
→ On which endpoint?
→ With what distribution?
→ With what data?
→ In what sequence?
→ Does it represent production?

Tests with characterization

"Let's simulate 5,000 concurrent users with:
- 60% browsing (GET /products)
- 25% search (GET /search?q=...)
- 10% cart (POST /cart)
- 5% checkout (POST /order)
With peak pattern at 9pm and medium-sized data"

The difference is between an arbitrary number and a model of reality.

Characterization Dimensions

1. Volume and rate

  • Requests per second (RPS)
  • Transactions per minute (TPM)
  • Concurrent users
  • Active sessions
  • Data processed per hour

2. Operations mix

Rarely does a system have only one type of operation. Characterize the proportion:

Operation Percentage Relative cost
Simple read 70% 1x
Search 15% 5x
Write 10% 10x
Report 5% 50x

3. Temporal pattern

How load varies:

  • Daily: peaks at specific times
  • Weekly: busier days
  • Monthly: closing, beginning of month
  • Seasonal: Black Friday, holidays, events

4. Data profile

  • Payload sizes (minimum, average, maximum)
  • Access distribution (Pareto: 20% of data receives 80% of accesses?)
  • Hot vs cold data
  • Growth rate

5. User behavior

  • Time between actions (think time)
  • Typical navigation sequence
  • Abandonment rate
  • Retry behavior

Characterization Methods

1. Log analysis

The most reliable source is production logs.

What to extract:

- Endpoint distribution (TOP 20 URLs)
- Temporal distribution (requests per hour/day)
- Response time distribution
- Status code distribution
- User agents and origins

Tools:

  • Custom scripts (awk, Python)
  • Elasticsearch + Kibana
  • APM tools

2. APM metrics

Application Performance Monitoring tools provide:

  • Throughput per endpoint
  • Response time per operation
  • Transaction traces
  • Dependencies between services

3. Product analytics

Google Analytics, Amplitude, Mixpanel can reveal:

  • User journeys
  • Conversion funnels
  • Time on each page
  • Most common actions

4. Stakeholder interviews

For new systems or significant changes:

  • What's the expected growth?
  • Which operations are critical?
  • What events might cause spikes?

Documenting the Workload

Characterization template

## Workload: [System Name]

### Baseline volume
- Average RPS: 500
- Peak RPS (daily): 2,000
- Peak RPS (event): 10,000

### Operations mix
| Operation | % of traffic | Average RPS |
|-----------|--------------|-------------|
| GET /api/products | 45% | 225 |
| GET /api/search | 20% | 100 |
| POST /api/cart | 15% | 75 |
| GET /api/user | 12% | 60 |
| POST /api/order | 8% | 40 |

### Temporal pattern
- Daily peak: 7pm-10pm (3x baseline)
- Busiest day: Thursday
- Special event: Black Friday (10x baseline)

### Data profile
- Average request payload: 2KB
- Average response payload: 15KB
- Active catalog: 50,000 products
- 80% of accesses on 5,000 products

### Behavior
- Average think time: 30s
- Average session: 8 pages
- Conversion rate: 3%

Translating to Tests

From workload to test script

  1. Define scenarios based on operations mix
  2. Configure proportions to reflect real distribution
  3. Add realistic think time between actions
  4. Use representative data (size, distribution)
  5. Simulate temporal patterns (ramp-up, steady state, peaks)

Example in k6

export const options = {
  scenarios: {
    browse: {
      executor: 'constant-vus',
      vus: 450,  // 45% of 1000
      exec: 'browseProducts',
    },
    search: {
      executor: 'constant-vus',
      vus: 200,  // 20%
      exec: 'searchProducts',
    },
    cart: {
      executor: 'constant-vus',
      vus: 150,  // 15%
      exec: 'addToCart',
    },
    // ...
  },
};

Common Mistakes

1. Using only averages

The average hides extremes. An endpoint that represents 5% of traffic might consume 50% of resources.

2. Ignoring correlations

Users don't take random actions. Search leads to viewing which leads to cart. Maintain the sequence.

3. Unrealistic test data

Testing with sequential IDs when production has Zipf distribution results in artificial cache hits.

4. Forgetting about growth

Today's workload isn't tomorrow's. Project future scenarios.

5. One-time characterization

Workloads change. Re-characterize periodically, especially after product changes.

Conclusion

Workload characterization is the invisible work that makes performance tests useful. Without it, you're testing an imaginary system.

Invest time in:

  1. Collecting real production data
  2. Documenting patterns and distributions
  3. Validating with stakeholders
  4. Updating regularly

The result is tests that actually predict system behavior under real conditions — and capacity planning decisions based on data, not assumptions.

Garbage in, garbage out. The quality of your tests depends on the quality of your characterization.

workloadcharacterizationtestingmethodology

Want to understand your platform's limits?

Contact us for a performance assessment.

Contact Us