Reputation
Badges 1
4 × Eureka!SuccessfulKoala55 Thanks for the clarification. What if I'm not using a git repo, and the script is only stored on a remote server? Is there a way to upload a "snapshot" of it?
SuccessfulKoala55 Will do.Thanks for the heads up
Hi guys,
Thanks for the previous discussion on ML-Ops with ClearML agent.
I'm still not sure how to monitor a training job on k8s (That wasn't scheduled by ClearML). My ClearML server is deployed and functional for tracking non-k8s jobs. But for a k8s job, I'm still unsuccessful.
Here is what I tried so far:
Adding my clearml.conf to the docker image tried to run clearml-init --file ~/clearml.conf
AgitatedDove14 Thanks, this resolved the issue
SubstantialElk6 I haven't deployed the clearml-agent yet. I'm trying to run a training script on a k8s pod and have it monitored on my self-hosted ClearML server. I added the clearml.conf via the Dockerfile of the pod image, but that didn't seem to work out.
AgitatedDove14 , by unsuccessful I mean that the task was being monitored on the demo ClearML server created by Allegro, rather than the one created by me and hosted on our servers. Which means (I think) that the config file is not taken in to account by clearml..