Hey, My Name Is Ido, And I Am A New Clearml User. My Goal Is To Monitor The Accuracy Of My Llm Outputs In Production. I Understand That I Can Log Each Iteration With A Binary Output (0 For Incorrect And 1 For Correct), But This Approach Makes The Visual G

Answered

Hey, my name is Ido, and I am a new ClearML user.
My goal is to monitor the accuracy of my LLM outputs in production. I understand that I can log each iteration with a binary output (0 for incorrect and 1 for correct), but this approach makes the visual graph less readable.
Is there a way to aggregate the results, such as defining an iteration as the accuracy of 100 samples, to improve the readability of the visual graph?

In general, what are the best practices for monitoring LLMs using ClearML?
Thanks!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					GloriousKoala29
				
					0
					 × 1

Votes Newest

Answers 5

I prefer serving my models in-house and only performing the monitoring via ClearML.

clearml-serving is an infrastructure for you to run models 🙂
to clarify, clearml-serving is running on your end (meaning this is not SaaS where a 3rd party is running the model)

By the way, I saw there is a project dashboard app which might support the visualization I am looking for. Is it suitable for such use case?

Hmm interesting, actually it might, it does collect matrices over time and averages them

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

so firs yes, I totally agree. This is why the clearml-serving has a dedicated statistics module that creates histograms over time, then we push it into Prometheus and connect grafana to it for dashboards and alerts.
To be honest, I would just use it instead of reporting manually, wdyt?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14> Thanks! The only thing is that I prefer serving my models in-house and only performing the monitoring via ClearML. By the way, I saw there is a project dashboard app which might support the visualization I am looking for. Is it suitable for such use case?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					GloriousKoala29
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> ,
I guess I can log the input-output pairs and report the average accuracy as a scalar. However, I'm not sure if this is the right way to monitor my data. Obviously, using iterations makes sense when training a model and tracking the loss, but when we are in production, I'm not sure if this dashboard is meant for that purpose.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					GloriousKoala29
				
					0
					 × 1

Hi @<1724960475575226368:profile|GloriousKoala29>

Is there a way to aggregate the results, such as defining an iteration as the accuracy of 100 samples

Hmm, i'm assuming what you actually want is to store it with the actual input/output and a score, is that correct?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

5 Answers

one year ago