Help Please, After Creating My Data Drift Monitoring Dashboard Using Clearml Serving And Grafana, How Can I Configure My Alerts To Be Notified When The Distribution Of My Metrics (Variables) Changes On My Heatmaps?

Answered

Help please, after creating my data drift monitoring dashboard using ClearML Serving and Grafana, how can I configure my alerts to be notified when the distribution of my metrics (variables) changes on my heatmaps?

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

Votes Newest

Answers 17

Hi @<1673501397007470592:profile|RelievedDuck3>

how can I configure my alerts to be notified when the distribution of my metrics (variables) changes on my heatmaps?

This can be done inside grafana, here is a simple example:
None
Specifically you need to create a new metric that is the distance of current distribution (i.e. heatmap) from the previous window), then on the distance metric, create an alert
basically instead of the heatmap value, you take the current bucket value and subtract it from the same bucket X minutes ago

create average over time per bucket, for example: histogram_avg(buckets_seconds[1d]))None
2, then subtract two histograms and sum the abs difference
sum(abs(buckets_value - histogram_avg(buckets_value[1d])))
None
None

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thank you very much for your help. I will test it.

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

I ran the test, but there was no result. I need to calculate a variation of average: avg(100*increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m])). The variation per minute. I tried using delta, but encountered an error.

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

It run correctly

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

I set up the alert rule on this metric by defining a threshold to trigger the alert. Did I understand correctly?

Yes exactly!

Or the new metric should...

basically combining the two, yes looks good.

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I used this PromQL query: 100 * increase(test12_model_custom:Glucose_bucket[1m]) / increase(test12_model_custom:Glucose_sum[1m]) to visualize the distribution of the variable (in my case called Glucose). So according to your explanation, I should calculate a new metric: sum(abs(test12_model_custom:Glucose_bucket - histogram_avg(test12_model_custom:Glucose_bucket[1m]))). I set up the alert rule on this metric by defining a threshold to trigger the alert. Did I understand correctly?

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

I ran the test, but there was no result.

what do you mean by no result, no data after the new query?

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

When I calculated the average, I got this result. Now, with this new metric, I need to calculate the variation per minute. I tried increase, rate, delta, but no result, just an error: bad_data: 1:110: parse error: ranges only allowed for vector selectors: delta(avg(100*increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m]))[1m])

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

Or the new metric should be: sum(abs((100 * increase(test12_model_custom:Glucose_bucket[1m]) / increase(test12_model_custom:Glucose_sum[1m])) - histogram_avg((100 * increase(test12_model_custom:Glucose_bucket[1m]) / increase(test12_model_custom:Glucose_sum[1m]))[1m])))?

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

try to break it into parts and understand what produces the error
for example:
increase(test12_model_custom:Glucose_bucket[1m])
increase(test12_model_custom:Glucose_sum[1m])
increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m])
and so on

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

To check the data drift, I need to calculate the avg of the last query by time bucket and calculate the variation by minute of the new metric

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

All these run correctly

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

I feel like to do this, I need to create a recording rule from the metric avg(...) at the Prometheus level and then query increase(). However, this approach requires me to interact directly with Prometheus.

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

Now, when I add delta to calculate the variation of this: error: bad_data: 1:110: parse error: ranges only allowed for vector selectors

This means your avg is already a scalar (i.e. not a vector) which means you can (as you said) have the alert based on that

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

and this?
avg(100*increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m]))

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Now, when I add delta to calculate the variation of this: error: bad_data: 1:110: parse error: ranges only allowed for vector selectors

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

Alternatively, can I directly define my alert on avg(...)

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					RelievedDuck3
				
					0
					 × 1

Write your answer

586 Views

17 Answers

9 months ago

8 months ago