Skip to content

Add Query Resource Based Eviction#7488

Open
eeldaly wants to merge 6 commits intocortexproject:masterfrom
eeldaly:query-eviction
Open

Add Query Resource Based Eviction#7488
eeldaly wants to merge 6 commits intocortexproject:masterfrom
eeldaly:query-eviction

Conversation

@eeldaly
Copy link
Copy Markdown
Contributor

@eeldaly eeldaly commented May 7, 2026

What this PR does:
This PR builds on the current resource based throttling infrastructure (#6674) to allow for evicting currently running queries. This is currently only implemented on querier pods but can be extended to other pods similar to resourced based throttling.


Flags
All flags are prefixed with -querier.query-protection.eviction.

threshold.cpu-utilization: Max CPU utilization (0–1) before evicting the heaviest query (default 0)
threshold.heap-utilization: Max heap utilization (0–1) before evicting the heaviest query (default 0)
check-interval: How frequently the evictor checks resource utilization (default 1s)
cooldown-period: Number of check intervals to wait after an eviction before evicting again (default 3)
eviction-metric: Metric used to determine the heaviest query (fetched_samples, fetched_series, fetched_chunks, fetched_chunk_bytes) (default fetched_samples)
min-query-age: Minimum time a query must be running before it becomes eligible for eviction (default 10s)

The evictor will be disabled and will not check every check-interval if both cpu and heap utilization are disabled (set to 0).


How it works
This feature is completely disabled and the registry will not be created if cpu-utilization and heap-utilization are set to 0. If either of them is larger:

  • Picked up queries will be registered to a registry to track all current queries in a querier
  • The evictor will check for utilization every check-interval
  • Once a threshold is breached, all currently running queries who have been evaluted for longer than min-query-age will be evaluated from heaviest based on eviction-metric. The heaviest query will be evicted
  • We will wait check-interval before checking again if threshold is breached.

Why the current metrics?
The current metrics are not the best to detect the root cause for high heap, however, they are readily available and can be used as a proxy until work in Prometheus/Thanos is done to allow for better metrics. peak_samples is currently only available in query_stats after the query is completed and we have no way of tracking current heap usage by a query. Any new metrics can easily be added to this structure later.


Metrics
cortex_query_evictions_total{resource="cpu|heap", component="querier"}: Counter increments by one for every eviction that occurs. A single query may lead to multiple increments if it retries and ends up evicted again.


note: make doc added the config to store-gateway.md file even though this is not currently implemented on there as it is built on top of resource based throttling.


Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
eeldaly added 5 commits May 7, 2026 15:09
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Copy link
Copy Markdown
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

Comment on lines +17 to +24
type EvictionConfig struct {
CPUUtilization float64 `yaml:"cpu_utilization"`
HeapUtilization float64 `yaml:"heap_utilization"`
CheckInterval time.Duration `yaml:"check_interval"`
CooldownPeriod int `yaml:"cooldown_period"`
EvictionMetric string `yaml:"eviction_metric"`
MinQueryAge time.Duration `yaml:"min_query_age"`
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe delete the duplicate EvictionConfig from pkg/util/queryeviction/evictor.go and have NewQueryEvictor accept configs.EvictionConfig instead

}

// Find the heaviest running query.
heaviest := e.registry.FindHeaviest(e.cfg.MinQueryAge)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If understand correctly this evicts one query per iteration. Maybe a follow up PR should evict more than one per iteration with cool down.

Comment thread pkg/querier/querier.go

metricFunc, err := queryeviction.ResolveMetricFunc(evCfg.EvictionMetric)
if err != nil {
level.Error(logger).Log("msg", "invalid eviction metric", "err", err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should fail to start if there is a misconfiguration right?. I feel we should exit New here after an error.

// Detect eviction: child context cancelled but parent context is still active.
if result.Err != nil && q.ctx.Err() != nil && ctx.Err() == nil {
return &promql.Result{
Err: promql.ErrStorage{Err: &ErrQueryEvicted{}},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns 500s right? what happens when query-frontend gets that. does it retries?. I think it does. I am not sure this is what we need. wdyt? @eeldaly

Comment thread pkg/cortex/modules.go
} else {
// TODO: Consider wrapping logger to differentiate from querier module logger
queryable, _, queryEngine = querier.New(t.Cfg.Querier, t.OverridesConfig, t.Distributor, t.StoreQueryables, rulerRegisterer, util_log.Logger, t.OverridesConfig.RulesPartialData, nil)
queryable, _, queryEngine, _ = querier.New(t.Cfg.Querier, t.OverridesConfig, t.Distributor, t.StoreQueryables, rulerRegisterer, util_log.Logger, t.OverridesConfig.RulesPartialData, nil)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No support for rulers. I see. it can be added in a follow up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants