Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hey Hey! My Team And I Are Currently Testing Using Clearml Agents For Running Experiments, So Far It Has Been Great And We Really Love The Whole Clearml Ecosystem!! However, There Is Something I Don'T Quite Understand. Basically We Have Two Clusters, A An

hey hey! my team and I are currently testing using clearml agents for running experiments, so far it has been great and we really love the whole ClearML ecosystem!! however, there is something I don't quite understand. Basically we have two clusters, A and B, and in each of them we spin up agents using the helm chart, and both are serving a "gpu" queue. The issue is that, when someone enqueues an experiment in this "gpu" queue, it often happens that, if agent A picks the job and it doesn't have resources available, then the experiment (pod) stays in a pending state, even if B has free resources. Is there any way to check which agent has resources available and run the experiment there??

Posted one year ago
Votes Newest

Answers 4

The helm chart of definitely the recommend way and also fits k8s better 🙂

Posted 11 months ago


Posted 11 months ago

Hi HomelyRabbit25 , in the k8s agent setup, and agent will pick up a task and create a pod for it as soon as it's able to. To limit this behavior, you can set a limit on the number of pods the agent can apply. For example, if each experiment uses a single GPU and cluster A has 8 GPUs, it would make sense to limit the number of pods (using the maxPods setting) for agent A to 8...

Posted one year ago

hey SuccessfulKoala55 that seems to work, thanks! One thing that it's not yet that clear to me is, what would be the recommended way of running agents in kubernetes? As I understand there is the ClearML Agent Helm Chart, which uses the k8s glue code, and running a clearml-agent daemon inside a pod (that already has the gpus assigned to it). Which one is the preferred way? I see issues with both approaches, and personally I believe that the Helm Chart is the correct way, but I can be wrong

Posted one year ago