Strategies

How Suppression Works

A deep dive into the probabilistic suppression algorithm — how the suppression factor is computed, what the three operating regimes mean, and how hard_limit_factor controls the headroom range.

This page is a standalone deep-dive into the suppression algorithm. It is aimed at readers who understand rate limiting generally and want to understand precisely what Trypema's suppressed strategy does and why.

For usage examples and API surface, see Suppressed Strategy.

Motivation

Traditional rate limiters use a hard cutoff: below the limit everything passes, above it everything is rejected. This creates a cliff-edge effect. At the exact threshold, a small increase in traffic causes all requests to fail simultaneously, which can trigger thundering-herd retries and cascading failures.

Trypema's suppressed strategy takes a different approach: probabilistic suppression. As traffic approaches and exceeds the target rate, the limiter begins denying an increasing fraction of requests rather than all of them at once.

Why this matters:

  • Smooth degradation. Clients slightly over the limit still get most requests through. As traffic grows, progressively more requests are denied.
  • No cliff edge. There is no single point where everything breaks. The transition from "mostly allowed" to "mostly denied" is gradual.
  • Some throughput for everyone. Instead of fully blocking clients that happen to cross the threshold, all clients see proportionally reduced throughput.
  • Observability. The suppression factor is a continuous signal (0.0 to 1.0) that tells you exactly how close a key is to its limit. This is far more useful for monitoring than a binary "allowed/rejected" state.

This approach is inspired by Ably's distributed rate limiting at scale.

The three operating regimes

For a given key with rate_limit (requests per second) and window_size_seconds, the suppressed strategy defines two thresholds:

soft_limit = window_size_seconds * rate_limit
hard_limit = window_size_seconds * rate_limit * hard_limit_factor

The strategy operates in three regimes:

1. Below capacity (no suppression)

Condition: accepted_usage < soft_limit

  • Suppression factor: 0.0
  • Returns: RateLimitDecision::Allowed
  • All requests admitted. No probabilistic logic involved. This is normal operation.

2. At or near capacity (probabilistic suppression)

Condition: Accepted usage is at/above the soft limit, but observed usage has not reached the hard limit.

  • Suppression factor: between 0.0 and 1.0 (computed and cached)
  • Returns: RateLimitDecision::Suppressed { is_allowed, suppression_factor }
  • Each request is probabilistically admitted with probability 1.0 - suppression_factor

3. Over the hard limit (full suppression)

Condition: observed_usage >= hard_limit

  • Suppression factor: forced to 1.0
  • Returns: RateLimitDecision::Suppressed { is_allowed: false, suppression_factor: 1.0 }
  • All requests denied until usage falls back under the hard limit.
The suppressed strategy never returns Rejected. Even over the hard limit, it returns Suppressed { is_allowed: false, suppression_factor: 1.0 }. Always gate your request on is_allowed.

The suppression factor formula

The suppression factor determines what fraction of requests should be denied. It is computed from the key's current traffic pattern:

average_rate_in_window = total_observed / window_size_seconds
rate_in_last_1000ms    = sum of bucket counts in the last 1000ms
perceived_rate         = max(average_rate_in_window, rate_in_last_1000ms)
suppression_factor     = 1.0 - (rate_limit / perceived_rate)

Why perceived_rate takes the max of both terms

The formula uses two rate estimates and takes the higher one:

  • average_rate_in_window: The average rate across the entire sliding window. This gives a stable, smoothed view of the key's traffic. It prevents suppression from collapsing to zero too quickly after a spike -- the spike's counts remain in the window average for window_size_seconds.
  • rate_in_last_1000ms: The rate observed in the most recent 1 second. This makes suppression react fast to short spikes. If a burst arrives, the 1-second term will spike before the window average catches up, causing suppression to engage immediately.

By taking the max, the algorithm is both responsive to sudden spikes (via the 1-second term) and stable during sustained overload (via the window average).

How this maps to probability

The suppression factor directly controls admission probability:

admission_probability = 1.0 - suppression_factor
suppression_factorAdmission probabilityEffect
0.0100%All requests pass (below capacity)
0.280%~80% admitted, ~20% denied
0.370%~70% admitted, ~30% denied
0.550%~50% admitted, ~50% denied
0.730%~30% admitted, ~70% denied
1.00%All requests denied (full suppression)

Short worked example

Suppose rate_limit = 10 req/s and perceived_rate = 14 req/s:

suppression_factor = 1.0 - (10.0 / 14.0) = 1.0 - 0.714 = 0.286

About 71% of requests will be admitted, and about 29% denied. The admitted rate will be approximately 14 * 0.714 ~ 10 req/s, which naturally converges toward the target rate.

The hard_limit_factor parameter

hard_limit_factor controls the gap between the "soft limit" (where suppression begins) and the "hard limit" (where full suppression kicks in).

soft_limit = rate_limit                        (suppression begins here)
hard_limit = rate_limit * hard_limit_factor    (full suppression here)

It is a validated newtype. HardLimitFactor::try_from(value) fails if value < 1.0. Default is 1.0.

How different values behave

hard_limit_factor = 1.0 (default):

  • Soft limit and hard limit are the same point.
  • Suppression starts and reaches 1.0 at the same threshold.
  • Produces hard cutoff behaviour similar to the absolute strategy. There is no gradual ramp.
  • Use this if you want suppressed-style counter tracking without the probabilistic ramp.

hard_limit_factor = 1.5 (recommended):

  • Hard limit is 50% above the soft limit.
  • There is a headroom range where suppression gradually ramps from 0.0 to 1.0.
  • Example: rate_limit = 10 req/s --> suppression begins at 10 req/s, full suppression at 15 req/s.

hard_limit_factor = 2.0:

  • Hard limit is 100% above the soft limit. Even more gradual ramp.
  • Example: rate_limit = 10 req/s --> suppression begins at 10 req/s, full suppression at 20 req/s.

Counter tracking

The suppressed strategy maintains two counters per key in its sliding window:

  • total (observed): The total number of calls seen for the key, regardless of whether they were admitted or denied.
  • declined: The number of calls denied by suppression (is_allowed: false).

Both counters are always incremented:

  1. Every call increments total by count.
  2. If the call is denied (is_allowed: false), declined is also incremented by count.
  3. If the call is admitted, only total is incremented.

You can derive accepted usage:

accepted = total - declined

This design means the limiter always has full visibility into both the offered load (total) and the actual throughput (accepted). The total counter drives the hard limit check, while accepted is used to determine which regime the key is in.

Suppression factor caching

Computing the suppression factor requires iterating over recent buckets (to calculate rate_in_last_1000ms) and reading window totals. Under high throughput, doing this on every call would be expensive.

To amortise this cost, the computed factor is cached per key for suppression_factor_cache_ms:

  • Local provider: Cached in-memory using Instant timestamps. The factor is recomputed only when the cache expires.
  • Redis provider: Cached in Redis as a string with a SET ... PX TTL. Multiple calls within the TTL reuse the cached value. If the cached value is outside [0.0, 1.0], it is treated as stale and recomputed.
  • Hybrid provider: Uses the suppression factor from the last Redis read on the local fast-path.

Trade-offs

Cache durationReaction speedCPU/Redis cost
Short (10-50ms)Fast reaction to traffic changesMore frequent recomputation
Long (100-1000ms)Slower reaction to sudden spikesLess overhead

Default is 100ms. A good starting point for most workloads.

Differences between Local, Redis, and Hybrid

All three providers use the same suppression algorithm. The core logic (three regimes, suppression factor formula, probabilistic admission) is identical. The differences are in where state is stored and how caching works.

Local provider

  • State: in-process DashMap + atomic counters.
  • Factor cache: in-memory with Instant timestamps.
  • Bucket expiry: Instant::elapsed().as_millis() (millisecond granularity, lazy eviction).
  • I/O: none (synchronous calls).
  • Scope: single-process only.

Redis provider

  • State: Redis keys (Lua scripts for atomicity).
  • Factor cache: SET ... PX with TTL in Redis.
  • Bucket expiry: Redis server time in milliseconds (inside Lua scripts).
  • I/O: one Redis round-trip per inc() or get_suppression_factor() call.
  • Scope: shared across all processes using the same Redis instance.

Hybrid provider

  • State: local in-memory state, periodically flushed to Redis by a background actor (RedisCommitter).
  • Factor cache: suppression factor from the last Redis read, used for probabilistic admission on the local fast-path.
  • I/O: none on the fast-path. Background Redis sync every sync_interval_ms.
  • Scope: distributed (with up to sync_interval_ms of lag).
  • The Suppressing state uses probabilistic suppression based on the cached factor (not necessarily full suppression).

Concrete worked example

Consider this configuration:

  • rate_limit = 10 req/s
  • window_size_seconds = 60
  • hard_limit_factor = 1.5

This gives us:

  • Soft limit (window capacity): 60 * 10 = 600 requests in the window
  • Hard limit: 60 * 10 * 1.5 = 900 requests in the window

Scenario: traffic ramps from 0 to 900+

Phase 1: 0 to 600 requests in the window (below capacity)

All requests return Allowed. Suppression factor is 0.0. Everything is normal.

Phase 2: 600 to ~700 requests (suppression begins)

Suppose the window now has 700 total observed requests, distributed evenly:

average_rate_in_window = 700 / 60 = 11.67 req/s
rate_in_last_1000ms    = 12 req/s  (example: slight spike)
perceived_rate         = max(11.67, 12.0) = 12.0
suppression_factor     = 1.0 - (10.0 / 12.0) = 0.167

About 83% of requests are admitted, 17% denied. The strategy returns Suppressed { suppression_factor: 0.167, is_allowed: true/false }.

Phase 3: ~800 requests (suppression increasing)

average_rate_in_window = 800 / 60 = 13.33 req/s
rate_in_last_1000ms    = 15 req/s  (traffic spiking)
perceived_rate         = max(13.33, 15.0) = 15.0
suppression_factor     = 1.0 - (10.0 / 15.0) = 0.333

About 67% of requests admitted, 33% denied. Suppression is noticeably active.

Phase 4: 900+ requests (hard limit reached)

Once observed usage reaches 900 (the hard limit), the strategy bypasses the factor calculation:

suppression_factor = 1.0 (forced)

All requests denied: Suppressed { is_allowed: false, suppression_factor: 1.0 }.

This continues until enough time passes for old buckets to expire and usage drops below the hard limit.

The get_suppression_factor() API

All three providers expose a read-only method to fetch the current suppression factor:

// Local (sync)
// let factor = rl.local().suppressed().get_suppression_factor("user_123");

// Redis (async)
// let factor = rl.redis().suppressed().get_suppression_factor(&key).await?;

// Hybrid (async)
// let factor = rl.hybrid().suppressed().get_suppression_factor(&key).await?;

Returns a value in [0.0, 1.0]:

  • 0.0 -- no suppression (below capacity or key not found)
  • 0.0 < sf < 1.0 -- partial suppression
  • 1.0 -- full suppression (over hard limit)

This method does not record any increment. Use it for:

  • Observability: Export as a metric to monitor how close keys are to their limits.
  • Dashboards: Show per-key suppression levels in real-time.
  • Debugging: Understand why calls are being suppressed for a specific key.

The method returns the cached value if fresh (within suppression_factor_cache_ms). Otherwise it recomputes the factor from the current sliding window state.