Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Help Please, After Creating My Data Drift Monitoring Dashboard Using Clearml Serving And Grafana, How Can I Configure My Alerts To Be Notified When The Distribution Of My Metrics (Variables) Changes On My Heatmaps?

Help please, after creating my data drift monitoring dashboard using ClearML Serving and Grafana, how can I configure my alerts to be notified when the distribution of my metrics (variables) changes on my heatmaps?

  
  
Posted 9 months ago
Votes Newest

Answers 17


Hi @<1673501397007470592:profile|RelievedDuck3>

how can I configure my alerts to be notified when the distribution of my metrics (variables) changes on my heatmaps?

This can be done inside grafana, here is a simple example:
None
Specifically you need to create a new metric that is the distance of current distribution (i.e. heatmap) from the previous window), then on the distance metric, create an alert
basically instead of the heatmap value, you take the current bucket value and subtract it from the same bucket X minutes ago

  • create average over time per bucket, for example: histogram_avg(buckets_seconds[1d]))None
    2, then subtract two histograms and sum the abs difference
    sum(abs(buckets_value - histogram_avg(buckets_value[1d])))
    None
    None
  
  
Posted 9 months ago

Thank you very much for your help. I will test it.

  
  
Posted 9 months ago

I ran the test, but there was no result. I need to calculate a variation of average: avg(100*increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m])). The variation per minute. I tried using delta, but encountered an error.

  
  
Posted 8 months ago

It run correctly
image

  
  
Posted 8 months ago

I set up the alert rule on this metric by defining a threshold to trigger the alert. Did I understand correctly?

Yes exactly!

Or the new metric should...

basically combining the two, yes looks good.

  
  
Posted 9 months ago

I used this PromQL query: 100 * increase(test12_model_custom:Glucose_bucket[1m]) / increase(test12_model_custom:Glucose_sum[1m]) to visualize the distribution of the variable (in my case called Glucose). So according to your explanation, I should calculate a new metric: sum(abs(test12_model_custom:Glucose_bucket - histogram_avg(test12_model_custom:Glucose_bucket[1m]))). I set up the alert rule on this metric by defining a threshold to trigger the alert. Did I understand correctly?

  
  
Posted 9 months ago

I ran the test, but there was no result.

what do you mean by no result, no data after the new query?

  
  
Posted 8 months ago

When I calculated the average, I got this result. Now, with this new metric, I need to calculate the variation per minute. I tried increase, rate, delta, but no result, just an error: bad_data: 1:110: parse error: ranges only allowed for vector selectors: delta(avg(100*increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m]))[1m])

  
  
Posted 8 months ago

Or the new metric should be: sum(abs((100 * increase(test12_model_custom:Glucose_bucket[1m]) / increase(test12_model_custom:Glucose_sum[1m])) - histogram_avg((100 * increase(test12_model_custom:Glucose_bucket[1m]) / increase(test12_model_custom:Glucose_sum[1m]))[1m])))?

  
  
Posted 9 months ago

try to break it into parts and understand what produces the error
for example:
increase(test12_model_custom:Glucose_bucket[1m])
increase(test12_model_custom:Glucose_sum[1m])
increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m])
and so on

  
  
Posted 8 months ago

To check the data drift, I need to calculate the avg of the last query by time bucket and calculate the variation by minute of the new metric

  
  
Posted 8 months ago

All these run correctly

  
  
Posted 8 months ago

I feel like to do this, I need to create a recording rule from the metric avg(...) at the Prometheus level and then query increase(). However, this approach requires me to interact directly with Prometheus.

  
  
Posted 8 months ago

Now, when I add delta to calculate the variation of this: error: bad_data: 1:110: parse error: ranges only allowed for vector selectors

This means your avg is already a scalar (i.e. not a vector) which means you can (as you said) have the alert based on that

  
  
Posted 8 months ago

and this?
avg(100*increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m]))

  
  
Posted 8 months ago

Now, when I add delta to calculate the variation of this: error: bad_data: 1:110: parse error: ranges only allowed for vector selectors

  
  
Posted 8 months ago

Alternatively, can I directly define my alert on avg(...)

  
  
Posted 8 months ago
586 Views
17 Answers
9 months ago
8 months ago
Tags