Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
I Got An Interesting Question From My Devs. If They Wish To Do Distributed Training, Is Clearml K8S Glue Suitable For It? Local Multiple Gpu: Just A Matter Of Assigning More Than One Gpu In The Yaml File Sent To The K8S Glue. Question Is How To Make This

I got an interesting question from my Devs. If they wish to do distributed training, is clearml k8s glue suitable for it?
Local multiple GPU: just a matter of assigning more than one GPU in the yaml file sent to the k8s glue. Question is how to make this configurable from dev end (coding).

Network based distributed training like horovod. I think ClearML cannot support this.

Posted 2 years ago
Votes Newest

Answers 3

It can also work by running on multiple known nodes.

Horovod sits on top of openmpi that needs ssh to open multiple nodes, I'm not sure how one would connect it without passing the SSH keys from one node to the other, and making sure they can directly communicate. (Not saying it is not possible, but just a few things to configure before it works, the enterprise edition remove the need for the direct SSH connection between the nodes)

How would i add a glue for multinode?

Basically spin another glue service (i.e. run it in parallel to the current one), have the new glue pull Tasks from a new queue (let's say X nodes) and make sure the YAML it uses spins X pods (i.e. k8s does the 4 pods, the pod definition itself the glue will take care of, as they are replicas of one another) make sense ?

Posted 2 years ago

HI SubstantialElk6
Yes you are correct the glue only needs to change the yaml and it will work.
When you say "Dev end" , what do you mean? I was thinking adding additional glue for multi node and just adding queues , for example add 4nodes queue and attach a glue to it, wdyt?
Regrading horovod, horovod is spinning its own nodes so integration with k8s is not trivial (regardless of ClearML). That said I know that they do have support for horovod in the Enterprise edition, but I'm not sure on the details.

Posted 2 years ago

Sorry, dev end I was referring to my developers.

I didn't think Horovod needs to be as complicated as you described. It can also work by running on multiple known nodes. How would i add a glue for multinode?

Horovod does also work with other similar products such as yours (E.g. Polyaxon).

Posted 2 years ago
3 Answers
2 years ago
4 months ago