Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All. I Am Wondering How People Tend To Use Clearml With Cross-Validation. Do You Tend To Create Separate Experiments For Each Fold? And If So, Would You Then Create Another Experiment For The Aggregated Results?

Hi All. I am wondering how people tend to use ClearML with cross-validation. Do you tend to create separate experiments for each fold? and if so, would you then create another experiment for the aggregated results?

  
  
Posted one year ago
Votes Newest

Answers 3


One thought is to initialise a new clearML task in each fold to capture the iteration-level metrics, and then create another task/experiment at the end to capture the aggregated metrics across folds.

That is probably the easiest, and the most scalable.
BTW: with the mew reporting feature, you can integrate the comparison of the CV directly into your final report 🙂

  
  
Posted one year ago

Thanks AgitatedDove14 . I am not using ClearML for scheduling/execution at this stage. I am evaluating ClearML for adding reporting to our current workflow. We have existing (parallelised) code for cross-validating models and I am playing with how best to log training/testing to ClearML. One thought is to initialise a new clearML task in each fold to capture the iteration-level metrics, and then create another task/experiment at the end to capture the aggregated metrics across folds. Alternatively, I could simply dump all fold and aggregated metrics into a single experiment. I don't have a good feel yet as to the pros and cons and was wondering if anyone had any advice.

  
  
Posted one year ago

Hi RattyBat71

Do you tend to create separate experiments for each fold?

If you really want to parallelized the workload, then splitting it to multiple executions (i.e. passing an argument of the index of the same CV) makes sense, then you can compare / sort the results based on a specific metric. That said if speed is not important, just having a single script with multiple CVs might be easier to implement?!

  
  
Posted one year ago
465 Views
3 Answers
one year ago
one year ago
Tags