I've been reading the documentation for a while and I'm not getting the following very well.
Given an open source codes say, huggingface. I wanted to do some training and i wanted to track my experiments using ClearML. The obvious choice would be to use Explicit Reporting in ClearML. But the part on sending my training job. and let ClearML orchestrate is vague. Would appreciate if i can be guided to the right documentation on this.
I think the most up-to-date documentation for that is currently on the github repo, right SuccessfulKoala55 ?
https://github.com/allegroai/clearml-server-helm
SuccessfulKoala55 Will do.Thanks for the heads up
HelpfulDeer76 I think you meant to post this in the channel? This is a thread and probably nobody monitors it...
Thanks. This appears to be solely for web UI and API, What if i want to orchestrate on K8S?
I'm all for more technical tutorials for doing that... all of this fits the clearml methodology
Indeed it is. The K8s repository name was changed as well to https://github.com/allegroai/clearml-server-k8s and helm repo is https://github.com/allegroai/clearml-server-helm
BTW if anyone from the future is reading this, try the docs again 😉
Hi guys,
Thanks for the previous discussion on ML-Ops with ClearML agent.
I'm still not sure how to monitor a training job on k8s (That wasn't scheduled by ClearML). My ClearML server is deployed and functional for tracking non-k8s jobs. But for a k8s job, I'm still unsuccessful.
Here is what I tried so far:
Adding my clearml.conf to the docker image tried to run clearml-init --file ~/clearml.conf
SubstantialElk6 this is a three parter -
getting workers on your cluster, again because of the rebrand I would go to the repo itself for the dochttps://github.com/allegroai/clearml-agent#kubernetes-integration-optional
2. integrating any code with clearml (2 lines of code)
3. executing that from the web ui
If you need any help with the three, the community is here for you 😉
thanks GrumpyPenguin23 , i'll look deeper on that. This kinda fits what i am looking for but its for TRAINS and there's no technical how-to.
https://clear.ml/blog/stop-using-kubernetes-for-ml-ops/