Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Hey! My Team And I Are Currently Testing Using Clearml Agents For Running Experiments, So Far It Has Been Great And We Really Love The Whole Clearml Ecosystem!! However, There Is Something I Don'T Quite Understand. Basically We Have Two Clusters, A An

hey hey! my team and I are currently testing using clearml agents for running experiments, so far it has been great and we really love the whole ClearML ecosystem!! however, there is something I don't quite understand. Basically we have two clusters, A and B, and in each of them we spin up agents using the helm chart, and both are serving a "gpu" queue. The issue is that, when someone enqueues an experiment in this "gpu" queue, it often happens that, if agent A picks the job and it doesn't have resources available, then the experiment (pod) stays in a pending state, even if B has free resources. Is there any way to check which agent has resources available and run the experiment there??

  
  
Posted 9 months ago
Votes Newest

Answers 4


The helm chart of definitely the recommend way and also fits k8s better 🙂

  
  
Posted 8 months ago

thanks!

  
  
Posted 8 months ago

hey @<1523701087100473344:profile|SuccessfulKoala55> that seems to work, thanks! One thing that it's not yet that clear to me is, what would be the recommended way of running agents in kubernetes? As I understand there is the ClearML Agent Helm Chart, which uses the k8s glue code, and running a clearml-agent daemon inside a pod (that already has the gpus assigned to it). Which one is the preferred way? I see issues with both approaches, and personally I believe that the Helm Chart is the correct way, but I can be wrong

  
  
Posted 9 months ago

Hi @<1673863788857659392:profile|HomelyRabbit25> , in the k8s agent setup, and agent will pick up a task and create a pod for it as soon as it's able to. To limit this behavior, you can set a limit on the number of pods the agent can apply. For example, if each experiment uses a single GPU and cluster A has 8 GPUs, it would make sense to limit the number of pods (using the maxPods setting) for agent A to 8...

  
  
Posted 9 months ago