How Suppression Works
This page is a standalone deep-dive into the suppression algorithm. It is aimed at readers who understand rate limiting generally and want to understand precisely what Trypema's suppressed strategy does and why.
For usage examples and API surface, see Suppressed Strategy.
Motivation
Traditional rate limiters use a hard cutoff: below the limit everything passes, above it everything is rejected. This creates a cliff-edge effect. At the exact threshold, a small increase in traffic causes all requests to fail simultaneously, which can trigger thundering-herd retries and cascading failures.
Trypema's suppressed strategy takes a different approach: probabilistic suppression. As traffic approaches and exceeds the target rate, the limiter begins denying an increasing fraction of requests rather than all of them at once.
Why this matters:
- Smooth degradation. Clients slightly over the limit still get most requests through. As traffic grows, progressively more requests are denied.
- No cliff edge. There is no single point where everything breaks. The transition from "mostly allowed" to "mostly denied" is gradual.
- Some throughput for everyone. Instead of fully blocking clients that happen to cross the threshold, all clients see proportionally reduced throughput.
- Observability. The suppression factor is a continuous signal (0.0 to 1.0) that tells you exactly how close a key is to its limit. This is far more useful for monitoring than a binary "allowed/rejected" state.
This approach is inspired by Ably's distributed rate limiting at scale.
The three operating regimes
For a given key with rate_limit (requests per second) and window_size_seconds, the suppressed strategy defines two thresholds:
soft_limit = window_size_seconds * rate_limit
hard_limit = window_size_seconds * rate_limit * hard_limit_factor
The strategy operates in three regimes:
1. Below capacity (no suppression)
Condition: accepted_usage < soft_limit
- Suppression factor:
0.0 - Returns:
RateLimitDecision::Allowed - All requests admitted. No probabilistic logic involved. This is normal operation.
2. At or near capacity (probabilistic suppression)
Condition: Accepted usage is at/above the soft limit, but observed usage has not reached the hard limit.
- Suppression factor: between
0.0and1.0(computed and cached) - Returns:
RateLimitDecision::Suppressed { is_allowed, suppression_factor } - Each request is probabilistically admitted with probability
1.0 - suppression_factor
3. Over the hard limit (full suppression)
Condition: observed_usage >= hard_limit
- Suppression factor: forced to
1.0 - Returns:
RateLimitDecision::Suppressed { is_allowed: false, suppression_factor: 1.0 } - All requests denied until usage falls back under the hard limit.
Rejected. Even over the hard limit, it returns Suppressed { is_allowed: false, suppression_factor: 1.0 }. Always gate your request on is_allowed.The suppression factor formula
The suppression factor determines what fraction of requests should be denied. It is computed from the key's current traffic pattern:
average_rate_in_window = total_observed / window_size_seconds
rate_in_last_1000ms = sum of bucket counts in the last 1000ms
perceived_rate = max(average_rate_in_window, rate_in_last_1000ms)
suppression_factor = 1.0 - (rate_limit / perceived_rate)
Why perceived_rate takes the max of both terms
The formula uses two rate estimates and takes the higher one:
average_rate_in_window: The average rate across the entire sliding window. This gives a stable, smoothed view of the key's traffic. It prevents suppression from collapsing to zero too quickly after a spike -- the spike's counts remain in the window average forwindow_size_seconds.rate_in_last_1000ms: The rate observed in the most recent 1 second. This makes suppression react fast to short spikes. If a burst arrives, the 1-second term will spike before the window average catches up, causing suppression to engage immediately.
By taking the max, the algorithm is both responsive to sudden spikes (via the 1-second term) and stable during sustained overload (via the window average).
How this maps to probability
The suppression factor directly controls admission probability:
admission_probability = 1.0 - suppression_factor
suppression_factor | Admission probability | Effect |
|---|---|---|
0.0 | 100% | All requests pass (below capacity) |
0.2 | 80% | ~80% admitted, ~20% denied |
0.3 | 70% | ~70% admitted, ~30% denied |
0.5 | 50% | ~50% admitted, ~50% denied |
0.7 | 30% | ~30% admitted, ~70% denied |
1.0 | 0% | All requests denied (full suppression) |
Short worked example
Suppose rate_limit = 10 req/s and perceived_rate = 14 req/s:
suppression_factor = 1.0 - (10.0 / 14.0) = 1.0 - 0.714 = 0.286
About 71% of requests will be admitted, and about 29% denied. The admitted rate will be approximately 14 * 0.714 ~ 10 req/s, which naturally converges toward the target rate.
The hard_limit_factor parameter
hard_limit_factor controls the gap between the "soft limit" (where suppression begins) and the "hard limit" (where full suppression kicks in).
soft_limit = rate_limit (suppression begins here)
hard_limit = rate_limit * hard_limit_factor (full suppression here)
It is a validated newtype. HardLimitFactor::try_from(value) fails if value < 1.0. Default is 1.0.
How different values behave
hard_limit_factor = 1.0 (default):
- Soft limit and hard limit are the same point.
- Suppression starts and reaches
1.0at the same threshold. - Produces hard cutoff behaviour similar to the absolute strategy. There is no gradual ramp.
- Use this if you want suppressed-style counter tracking without the probabilistic ramp.
hard_limit_factor = 1.5 (recommended):
- Hard limit is 50% above the soft limit.
- There is a headroom range where suppression gradually ramps from
0.0to1.0. - Example:
rate_limit = 10 req/s--> suppression begins at 10 req/s, full suppression at 15 req/s.
hard_limit_factor = 2.0:
- Hard limit is 100% above the soft limit. Even more gradual ramp.
- Example:
rate_limit = 10 req/s--> suppression begins at 10 req/s, full suppression at 20 req/s.
Counter tracking
The suppressed strategy maintains two counters per key in its sliding window:
total(observed): The total number of calls seen for the key, regardless of whether they were admitted or denied.declined: The number of calls denied by suppression (is_allowed: false).
Both counters are always incremented:
- Every call increments
totalbycount. - If the call is denied (
is_allowed: false),declinedis also incremented bycount. - If the call is admitted, only
totalis incremented.
You can derive accepted usage:
accepted = total - declined
This design means the limiter always has full visibility into both the offered load (total) and the actual throughput (accepted). The total counter drives the hard limit check, while accepted is used to determine which regime the key is in.
Suppression factor caching
Computing the suppression factor requires iterating over recent buckets (to calculate rate_in_last_1000ms) and reading window totals. Under high throughput, doing this on every call would be expensive.
To amortise this cost, the computed factor is cached per key for suppression_factor_cache_ms:
- Local provider: Cached in-memory using
Instanttimestamps. The factor is recomputed only when the cache expires. - Redis provider: Cached in Redis as a string with a
SET ... PXTTL. Multiple calls within the TTL reuse the cached value. If the cached value is outside[0.0, 1.0], it is treated as stale and recomputed. - Hybrid provider: Uses the suppression factor from the last Redis read on the local fast-path.
Trade-offs
| Cache duration | Reaction speed | CPU/Redis cost |
|---|---|---|
| Short (10-50ms) | Fast reaction to traffic changes | More frequent recomputation |
| Long (100-1000ms) | Slower reaction to sudden spikes | Less overhead |
Default is 100ms. A good starting point for most workloads.
Differences between Local, Redis, and Hybrid
All three providers use the same suppression algorithm. The core logic (three regimes, suppression factor formula, probabilistic admission) is identical. The differences are in where state is stored and how caching works.
Local provider
- State: in-process
DashMap+ atomic counters. - Factor cache: in-memory with
Instanttimestamps. - Bucket expiry:
Instant::elapsed().as_millis()(millisecond granularity, lazy eviction). - I/O: none (synchronous calls).
- Scope: single-process only.
Redis provider
- State: Redis keys (Lua scripts for atomicity).
- Factor cache:
SET ... PXwith TTL in Redis. - Bucket expiry: Redis server time in milliseconds (inside Lua scripts).
- I/O: one Redis round-trip per
inc()orget_suppression_factor()call. - Scope: shared across all processes using the same Redis instance.
Hybrid provider
- State: local in-memory state, periodically flushed to Redis by a background actor (
RedisCommitter). - Factor cache: suppression factor from the last Redis read, used for probabilistic admission on the local fast-path.
- I/O: none on the fast-path. Background Redis sync every
sync_interval_ms. - Scope: distributed (with up to
sync_interval_msof lag). - The
Suppressingstate uses probabilistic suppression based on the cached factor (not necessarily full suppression).
Concrete worked example
Consider this configuration:
rate_limit = 10 req/swindow_size_seconds = 60hard_limit_factor = 1.5
This gives us:
- Soft limit (window capacity):
60 * 10 = 600requests in the window - Hard limit:
60 * 10 * 1.5 = 900requests in the window
Scenario: traffic ramps from 0 to 900+
Phase 1: 0 to 600 requests in the window (below capacity)
All requests return Allowed. Suppression factor is 0.0. Everything is normal.
Phase 2: 600 to ~700 requests (suppression begins)
Suppose the window now has 700 total observed requests, distributed evenly:
average_rate_in_window = 700 / 60 = 11.67 req/s
rate_in_last_1000ms = 12 req/s (example: slight spike)
perceived_rate = max(11.67, 12.0) = 12.0
suppression_factor = 1.0 - (10.0 / 12.0) = 0.167
About 83% of requests are admitted, 17% denied. The strategy returns Suppressed { suppression_factor: 0.167, is_allowed: true/false }.
Phase 3: ~800 requests (suppression increasing)
average_rate_in_window = 800 / 60 = 13.33 req/s
rate_in_last_1000ms = 15 req/s (traffic spiking)
perceived_rate = max(13.33, 15.0) = 15.0
suppression_factor = 1.0 - (10.0 / 15.0) = 0.333
About 67% of requests admitted, 33% denied. Suppression is noticeably active.
Phase 4: 900+ requests (hard limit reached)
Once observed usage reaches 900 (the hard limit), the strategy bypasses the factor calculation:
suppression_factor = 1.0 (forced)
All requests denied: Suppressed { is_allowed: false, suppression_factor: 1.0 }.
This continues until enough time passes for old buckets to expire and usage drops below the hard limit.
The get_suppression_factor() API
All three providers expose a read-only method to fetch the current suppression factor:
// Local (sync)
// let factor = rl.local().suppressed().get_suppression_factor("user_123");
// Redis (async)
// let factor = rl.redis().suppressed().get_suppression_factor(&key).await?;
// Hybrid (async)
// let factor = rl.hybrid().suppressed().get_suppression_factor(&key).await?;
Returns a value in [0.0, 1.0]:
0.0-- no suppression (below capacity or key not found)0.0 < sf < 1.0-- partial suppression1.0-- full suppression (over hard limit)
This method does not record any increment. Use it for:
- Observability: Export as a metric to monitor how close keys are to their limits.
- Dashboards: Show per-key suppression levels in real-time.
- Debugging: Understand why calls are being suppressed for a specific key.
The method returns the cached value if fresh (within suppression_factor_cache_ms). Otherwise it recomputes the factor from the current sliding window state.

