Strategies

How Suppression Works

A practical explanation of suppression factor, hard limits, and the main tuning knobs.

This page is the operational view of suppressed mode: what the numbers mean and which knobs matter.

Core idea

Suppressed mode does not wait for a hard boundary and then reject everything. Instead, it computes a suppression factor from current pressure and uses that to decide whether each request is admitted.

The higher the pressure, the higher the chance that a request is denied.

The two knobs that matter most

`hard_limit_factor`

This caps how far past the target rate a key can go before suppression effectively becomes total.

lower values make shedding become severe sooner
higher values allow more burst headroom before the strategy clamps down fully

`suppression_factor_cache_ms`

Suppression factor computation can be cached briefly.

lower values track pressure changes more closely
higher values reduce recomputation overhead

How to reason about it

Use suppressed mode when the question is not only "are we over the line?" but also "how aggressively should we shed right now?"

That makes it useful for systems where:

some degraded throughput is better than a full stop
load can spike suddenly
downstream systems benefit from a smoother decline in admitted traffic

A practical mental model

0.0 means healthy
mid-range values mean the key is hot and admission is becoming selective
values near 1.0 mean the key is heavily overloaded and most traffic should be shed

If your application cannot tolerate probabilistic outcomes, suppressed mode is the wrong tool. Use absolute limiting instead.

Suppressed

Probabilistic shedding that increases as a key approaches or exceeds its target rate.

Local

In-process rate limiting with the lowest latency and no external dependencies.