SubstantialElk6 I haven't deployed the clearml-agent yet. I'm trying to run a training script on a k8s pod and have it monitored on my self-hosted ClearML server. I added the clearml.conf via the Dockerfile of the pod image, but that didn't seem to work out.
AgitatedDove14 , by unsuccessful I mean that the task was being monitored on the demo ClearML server created by Allegro, rather than the one created by me and hosted on our servers. Which means (I think) that the config file is not taken in to account by clearml..
AgitatedDove14 Thanks, this resolved the issue
Hi HelpfulDeer76 , I'm facing similar issues. Would you mind describing in detail how you deploy clearml-agent? Is it running as a pod on k8s?
That wasn't scheduled by ClearML).
This means that from Clearml perspective they are "manual" i.e the job it self (by calling Task.init) create the experiment in the system, and fills in all the fields.
But for a k8s job, I'm still unsuccessful.
HelpfulDeer76 When you say "unsuccessful" what exactly do you mean ?
Could it be they are reported to the clearml demo server (the default server if no configuration is found) ?
Hi HelpfulDeer76
I mean that the task was being monitored on the demo ClearML server created by Allegro
Yes that is consistent with what I would expect to have happened
Basically if you are running it as k8s job, you can just configure the following environment variables:CLEARML_WEB_HOST:
CLEARML_API_HOST:
CLEARML_FILES_HOST:
CLEARML_API_ACCESS_KEY: <clearml access> CLEARML_API_SECRET_KEY: <clearml secret>