Benchmarks

Benchmark Concepts

How to interpret ops/s, latency percentiles, and workload knobs.

This page explains the fields you see in Trypema benchmark tables (stress/load harness) and the knobs that shape those results.

What one "operation" is

In the stress harness, an operation is one call to the rate limiter (typically inc(...)):

  • choose a key (e.g. a user id or IP)
  • evaluate admission for that key and strategy
  • record usage if the strategy/provider does so
  • return a decision (Allowed, Rejected, or Suppressed { is_allowed })

So when you see throughput and latency, it is about rate-limiter operations, not your full request lifecycle.

Throughput: ops/s

ops/s is operations per second.

Simple example:

  • A run prints ops/s = 500,000.
  • That means the harness completed about 500k inc(...) calls per second.
  • Those ops can be a mix of allowed / rejected / suppressed outcomes.

Latency percentiles (microseconds)

Percentiles are printed in microseconds (us):

  • p50: 50% of sampled ops finish in this time or less
  • p95: 95% in this time or less
  • p99: 99% in this time or less
  • p99.9 (often shown as p999): 99.9% in this time or less
  • max: the slowest sampled op

Simple example:

  • p50 = 2 us, p99 = 40 us, max = 9,000 us
  • Half your ops are 2 microseconds or faster.
  • 99% are 40 microseconds or faster.
  • One sampled op took 9,000 microseconds = 9 milliseconds.
Sampling: the harness may record 1 latency sample every N ops (--sample-every). Percentiles and max are computed over the sampled set.

Decision counts

The harness prints how many ops ended in each outcome.

Allowed

allowed is the number of ops that were admitted.

Example:

  • allowed = 30,000 over a 30s run is about 1,000 allowed ops/sec.

Rejected

rejected is the number of ops that were denied with a hard reject.

This is typical for the absolute strategy.

Example:

  • If you set a low per-key rate limit and drive traffic above it, you will see rejected climb.

Suppressed (suppressed strategy)

For the suppressed strategy, the decision near/over capacity is Suppressed { is_allowed }:

  • suppressed_allowed: suppressed, but still admitted
  • suppressed_denied: suppressed, and denied

In other words, suppressed strategy "denies" via suppressed_denied, not rejected.

Errors

errors counts operational failures during the run (for example Redis errors, connection problems, unexpected responses).

Errors are not the same thing as rejected/suppressed decisions.

Workload knobs (how a run is shaped)

Key space

key_space is the number of distinct keys the harness can use.

Example:

  • key_space = 10 means keys are user_0..user_9.
  • key_space = 100000 means keys are user_0..user_99999.

Key distribution

Key distribution controls how the harness picks keys per op.

  • hot: always uses the same key (user_0)
  • uniform: picks a key uniformly at random from the key space
  • skewed: sends a fraction of traffic to the hot key (user_0), and the rest uniformly across the remaining keys

Simple examples (key_space = 10):

  • hot: every op uses user_0
  • uniform: each op randomly picks one of user_0..user_9 with equal probability
  • skewed with hot_fraction = 0.9: about 90% of ops use user_0, about 10% spread across user_1..user_9

Window size (sliding window)

Trypema enforces limits over a sliding time window.

  • window_size_seconds is the window length.

Rate limit per second (per key)

rate_limit_per_s is the configured per-key rate.

Over a window, the rough "budget" per key is:

window_budget = window_size_seconds * rate_limit_per_s

Simple example:

  • rate_limit_per_s = 1 and window_size_seconds = 6
  • window budget is 6 * 1 = 6 requests per 6 seconds
  • you can send a burst of up to ~6 quickly, then you should expect denials until the window slides forward

Threads / concurrency

threads controls concurrency in the harness.

Higher concurrency can increase throughput until you hit CPU limits (local) or I/O/contention limits (Redis/hybrid).

Run mode: max vs target-qps

  • max: closed-loop, pushes as fast as possible
  • target-qps: open-loop offered load, tries to generate a fixed QPS (optionally with bursts)

Simple example:

  • Use max when you want to find peak ops/s.
  • Use target-qps when you want stable tail latency comparisons at a fixed offered load.

Bursts (target-qps mode)

In target-qps mode you can add periodic bursts:

  • base load: --target-qps
  • burst load: --burst-qps
  • how often bursts repeat: --burst-period-ms
  • how long the burst is active: --burst-duration-ms

Simple example:

  • base target_qps = 20,000
  • burst burst_qps = 200,000 for 250ms every 2,000ms

This helps you see how providers/strategies behave under spikes.

Hybrid: sync interval (sync_interval_ms)

Hybrid uses a local fast-path and periodically syncs increments to Redis.

  • sync_interval_ms is how often Hybrid flushes/syncs its local state to Redis.
  • Smaller values reduce state lag but increase Redis load.
  • Larger values increase throughput and reduce Redis load, but allow more state lag.

In the published benchmark results, Hybrid runs use sync_interval_ms = 10ms.

How to read a results row (quick checklist)

  • Compare ops/s only when the workload knobs match (provider, strategy, mode, threads, key dist/space, window, rate limit).
  • Use decision counts to understand what path you measured (mostly allow-path vs mostly deny-path).
  • Use p99/p99.9/max to spot tail behavior; always keep errors in view for Redis-based runs.