Hi All. I Am Wondering How People Tend To Use Clearml With Cross-Validation. Do You Tend To Create Separate Experiments For Each Fold? And If So, Would You Then Create Another Experiment For The Aggregated Results?

Answered

Hi All. I am wondering how people tend to use ClearML with cross-validation. Do you tend to create separate experiments for each fold? and if so, would you then create another experiment for the aggregated results?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					RattyBat71
				
					0
					 × 1

Votes Newest

Answers 3

One thought is to initialise a new clearML task in each fold to capture the iteration-level metrics, and then create another task/experiment at the end to capture the aggregated metrics across folds.

That is probably the easiest, and the most scalable.
BTW: with the mew reporting feature, you can integrate the comparison of the CV directly into your final report 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks AgitatedDove14 . I am not using ClearML for scheduling/execution at this stage. I am evaluating ClearML for adding reporting to our current workflow. We have existing (parallelised) code for cross-validating models and I am playing with how best to log training/testing to ClearML. One thought is to initialise a new clearML task in each fold to capture the iteration-level metrics, and then create another task/experiment at the end to capture the aggregated metrics across folds. Alternatively, I could simply dump all fold and aggregated metrics into a single experiment. I don't have a good feel yet as to the pros and cons and was wondering if anyone had any advice.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					RattyBat71
				
					0
					 × 1

Hi RattyBat71

Do you tend to create separate experiments for each fold?

If you really want to parallelized the workload, then splitting it to multiple executions (i.e. passing an argument of the index of the same CV) makes sense, then you can compare / sort the results based on a specific metric. That said if speed is not important, just having a single script with multiple CVs might be easier to implement?!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

3 Answers

2 years ago