Skip to content

adding ingestion delay metric#7443

Open
Shvejan wants to merge 2 commits intocortexproject:masterfrom
Shvejan:ingestion-delay-metric
Open

adding ingestion delay metric#7443
Shvejan wants to merge 2 commits intocortexproject:masterfrom
Shvejan:ingestion-delay-metric

Conversation

@Shvejan
Copy link
Copy Markdown
Contributor

@Shvejan Shvejan commented Apr 21, 2026

What this PR does:

Adds a new per-tenant native histogram metric cortex_ingester_ingestion_delay_seconds to track the
delay between when samples arrive at the ingester and their original timestamp. This metric helps
operators:

  • Set appropriate rule_query_offset values based on actual ingestion delays
  • Troubleshoot discrepancies between recording rules and underlying metrics
  • Monitor and alert on ingestion lag per tenant

Which issue(s) this PR fixes:
Fixes #6748

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE],
    [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

Signed-off-by: Shvejan Mutheboyina <shvejan@amazon.com>
@dosubot dosubot Bot added component/ingester type/observability To help know what is going on inside Cortex labels Apr 21, 2026
Signed-off-by: Shvejan Mutheboyina <shvejan@amazon.com>
Copy link
Copy Markdown
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shvejan Let me know what you think

Comment thread pkg/ingester/ingester.go
i.metrics.ingestedHistogramBuckets.WithLabelValues(userID).Observe(float64(hp.BucketCount()))
// Observe ingestion delay
if delayMs := time.Now().UnixMilli() - hp.TimestampMs; delayMs >= 0 {
i.metrics.ingestionDelaySeconds.WithLabelValues(userID).Observe(float64(delayMs) / 1000.0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line adds 15% overhead

go test -tags "netgo slicelabels" -run='^$' -bench=BenchmarkIngesterPush -benchtime=5s -count=3 ./pkg/ingester/...

it could be improved to 6/10% with something like:

  delayObserver := i.metrics.ingestionDelaySeconds.WithLabelValues(userID)                                                                                                                                                                      
...                                                                                                                                                                              
  observeDelay := func(timestampMs int64) {                                                                                                                                                                                                     
      if delayMs := time.Now().UnixMilli() - timestampMs; delayMs >= 0 {                                                                                                                                                                        
          delayObserver.Observe(float64(delayMs) / 1000.0)                                                                                                                                                                                      
      }                                                                                                                                                                                                                                         
  }  

The existing ingestedHistogramBuckets has the same per-sample WithLabelValues pattern and would benefit from the same treatment. But let's leave that for another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ingester size/L type/observability To help know what is going on inside Cortex

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a metric to track delay in ingestion

2 participants