Benchmark Concepts
This page explains the fields you see in Trypema benchmark tables (stress/load harness) and the knobs that shape those results.
What one "operation" is
In the stress harness, an operation is one call to the rate limiter (typically inc(...)):
- choose a key (e.g. a user id or IP)
- evaluate admission for that key and strategy
- record usage if the strategy/provider does so
- return a decision (
Allowed,Rejected, orSuppressed { is_allowed })
So when you see throughput and latency, it is about rate-limiter operations, not your full request lifecycle.
Throughput: ops/s
ops/s is operations per second.
Simple example:
- A run prints
ops/s = 500,000. - That means the harness completed about 500k
inc(...)calls per second. - Those ops can be a mix of allowed / rejected / suppressed outcomes.
Latency percentiles (microseconds)
Percentiles are printed in microseconds (us):
p50: 50% of sampled ops finish in this time or lessp95: 95% in this time or lessp99: 99% in this time or lessp99.9(often shown asp999): 99.9% in this time or lessmax: the slowest sampled op
Simple example:
p50 = 2 us,p99 = 40 us,max = 9,000 us- Half your ops are 2 microseconds or faster.
- 99% are 40 microseconds or faster.
- One sampled op took 9,000 microseconds = 9 milliseconds.
--sample-every). Percentiles and max are computed over the sampled set.Decision counts
The harness prints how many ops ended in each outcome.
Allowed
allowed is the number of ops that were admitted.
Example:
allowed = 30,000over a 30s run is about 1,000 allowed ops/sec.
Rejected
rejected is the number of ops that were denied with a hard reject.
This is typical for the absolute strategy.
Example:
- If you set a low per-key rate limit and drive traffic above it, you will see
rejectedclimb.
Suppressed (suppressed strategy)
For the suppressed strategy, the decision near/over capacity is Suppressed { is_allowed }:
suppressed_allowed: suppressed, but still admittedsuppressed_denied: suppressed, and denied
In other words, suppressed strategy "denies" via suppressed_denied, not rejected.
Errors
errors counts operational failures during the run (for example Redis errors, connection problems, unexpected responses).
Errors are not the same thing as rejected/suppressed decisions.
Workload knobs (how a run is shaped)
Key space
key_space is the number of distinct keys the harness can use.
Example:
key_space = 10means keys areuser_0..user_9.key_space = 100000means keys areuser_0..user_99999.
Key distribution
Key distribution controls how the harness picks keys per op.
hot: always uses the same key (user_0)uniform: picks a key uniformly at random from the key spaceskewed: sends a fraction of traffic to the hot key (user_0), and the rest uniformly across the remaining keys
Simple examples (key_space = 10):
hot: every op usesuser_0uniform: each op randomly picks one ofuser_0..user_9with equal probabilityskewedwithhot_fraction = 0.9: about 90% of ops useuser_0, about 10% spread acrossuser_1..user_9
Window size (sliding window)
Trypema enforces limits over a sliding time window.
window_size_secondsis the window length.
Rate limit per second (per key)
rate_limit_per_s is the configured per-key rate.
Over a window, the rough "budget" per key is:
window_budget = window_size_seconds * rate_limit_per_s
Simple example:
rate_limit_per_s = 1andwindow_size_seconds = 6- window budget is
6 * 1 = 6requests per 6 seconds - you can send a burst of up to ~6 quickly, then you should expect denials until the window slides forward
Threads / concurrency
threads controls concurrency in the harness.
Higher concurrency can increase throughput until you hit CPU limits (local) or I/O/contention limits (Redis/hybrid).
Run mode: max vs target-qps
max: closed-loop, pushes as fast as possibletarget-qps: open-loop offered load, tries to generate a fixed QPS (optionally with bursts)
Simple example:
- Use
maxwhen you want to find peakops/s. - Use
target-qpswhen you want stable tail latency comparisons at a fixed offered load.
Bursts (target-qps mode)
In target-qps mode you can add periodic bursts:
- base load:
--target-qps - burst load:
--burst-qps - how often bursts repeat:
--burst-period-ms - how long the burst is active:
--burst-duration-ms
Simple example:
- base
target_qps = 20,000 - burst
burst_qps = 200,000for 250ms every 2,000ms
This helps you see how providers/strategies behave under spikes.
Hybrid: sync interval (sync_interval_ms)
Hybrid uses a local fast-path and periodically syncs increments to Redis.
sync_interval_msis how often Hybrid flushes/syncs its local state to Redis.- Smaller values reduce state lag but increase Redis load.
- Larger values increase throughput and reduce Redis load, but allow more state lag.
In the published benchmark results, Hybrid runs use sync_interval_ms = 10ms.
How to read a results row (quick checklist)
- Compare
ops/sonly when the workload knobs match (provider, strategy, mode, threads, key dist/space, window, rate limit). - Use decision counts to understand what path you measured (mostly allow-path vs mostly deny-path).
- Use
p99/p99.9/maxto spot tail behavior; always keeperrorsin view for Redis-based runs.

