Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hi, We Are Encountering An Increasing Number Of Cases Where It Takes Quite A While Before Actual Training (Gpu Utilisation) Can Be Done. After Observing, This Is What We Discovered. The Following Are The Steps And Bottlenecks.

Hi, we are encountering an increasing number of cases where it takes quite a while before actual training (GPU utilisation) can be done. After observing, this is what we discovered. The following are the steps and bottlenecks.

  • Job submitted to ClearML
  • ClearML spawns K8S pod via k8sGlue (Within 30 secs)
  • Pod setup and runs script (Take up to 5 mins)
  • Script uses ClearML-data to pull versioned dataset (Took 30 mins due to size of dataset)- backend is S3, but is it suitable/compatible with Clearml-Data data pulling strategies? - Batch/Preprocess/Train
    Questions i am asking now are;
  • Are there other best practices in the data pulling part, especially for experiments that are using exact same dataset? (E.g. Cache?)
  • What kind of storage should we use with ClearML? (Today is via S3 )
  
  
Posted one year ago
Votes Newest

Answers

841 Views
0 Answers
one year ago
one year ago
Tags