Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Guys, Thanks For The Previous Discussion On Ml-Ops With Clearml Agent. I'M Still Not Sure How To Monitor A Training Job On K8S (That Wasn'T Scheduled By Clearml). My Clearml Server Is Deployed And Functional For Tracking Non-K8S Jobs. But For A K8S Job

Hi guys,
Thanks for the previous discussion on ML-Ops with ClearML agent.
I'm still not sure how to monitor a training job on k8s (That wasn't scheduled by ClearML). My ClearML server is deployed and functional for tracking non-k8s jobs. But for a k8s job, I'm still unsuccessful.
Here is what I tried so far:
Adding my clearml.conf to the docker image Also tried to run clearml-init --file ~/clearml.conf

  
  
Posted 3 years ago
Votes Newest

Answers 6


Hi HelpfulDeer76 , I'm facing similar issues. Would you mind describing in detail how you deploy clearml-agent? Is it running as a pod on k8s?

  
  
Posted 3 years ago

Hi HelpfulDeer76

I mean that the task was being monitored on the demo ClearML server created by Allegro

Yes that is consistent with what I would expect to have happened
Basically if you are running it as k8s job, you can just configure the following environment variables:
CLEARML_WEB_HOST: CLEARML_API_HOST: CLEARML_FILES_HOST: CLEARML_API_ACCESS_KEY: <clearml access> CLEARML_API_SECRET_KEY: <clearml secret>

  
  
Posted 3 years ago

AgitatedDove14 Thanks, this resolved the issue

  
  
Posted 3 years ago

SubstantialElk6 I haven't deployed the clearml-agent yet. I'm trying to run a training script on a k8s pod and have it monitored on my self-hosted ClearML server. I added the clearml.conf via the Dockerfile of the pod image, but that didn't seem to work out.

  
  
Posted 3 years ago

AgitatedDove14 , by unsuccessful I mean that the task was being monitored on the demo ClearML server created by Allegro, rather than the one created by me and hosted on our servers. Which means (I think) that the config file is not taken in to account by clearml..

  
  
Posted 3 years ago

That wasn't scheduled by ClearML).

This means that from Clearml perspective they are "manual" i.e the job it self (by calling Task.init) create the experiment in the system, and fills in all the fields.

But for a k8s job, I'm still unsuccessful.

HelpfulDeer76 When you say "unsuccessful" what exactly do you mean ?
Could it be they are reported to the clearml demo server (the default server if no configuration is found) ?

  
  
Posted 3 years ago
1K Views
6 Answers
3 years ago
one year ago
Tags