Reputation
Badges 1
15 × Eureka!I pull from dockerhub and then run docker run allegroai/clearml-agent-k8s:aws-latest-1.21.2
I think JuicyFox94 is maintaining the clearml helm charts. Can we specify service mode for the helm chart? thank you
Oh, and when I create pipeline, it looks like it create a task for the pipeline itself, and each step is run as a task. I can also put the pipeline and each steps on different queue independently, right? Look like it's possible CostlyOstrich36
AgitatedDove14 thank you. I'll try that.
I deployed the agents using helm
hi AgitatedDove14 do you mean I insert code in the clearml package itself?
The long story is I tried to create task scheduler and my clearml agent running on k8s. so the scheduler r un as a pod. But I found out the pod cant run for very long time. It may have been killed or evicted or something after a day or 2.
So Im thinking I might need to create my ...
hi martin, im using version 1.6.0, so probably not the newer one. Thank you for the confirmation though 😄
Thank you, John and Jake. As I understand it, the deployment only deploy the clearml server and not the agents. So I'm a bit u nclear when said k8s is more scalable? Does the clearml server need to scale up and down? Or do you mean k8s deployment will have easier time spin up agent instances to run the tasks? SuccessfulKoala55
AgitatedDove14 oh, when I deploy the agents on k8s (using helm), I see them run reliably (no killed), are they running in service mode? Do you have a link how to setup a task scheduler to run in service mode in k8s? is it similar to the clearml agent? (from my understanding, the agent also listen to a queue and spin a new pod to handle incoming tasks on the ...
So it seems decorator is simply the superior option? In which case would we use add_task() option?
CostlyOstrich36 Im using clearml server 1.6. I dont get any error, that's the thing. Even the upload_artifact() function return true. But the artifact file simply does not exist on the server. The function is working fine if the artifact is 4GB, but when it is 12G, the file simply doesnt get uploaded to server.
Im using clearml server on EC2.
JuicyFox94 I'll need to check with my infra team on that. when the pod get killed, I cant access any log on my rancher end. On clearml server, it simply show the pod stop communcate with the server. no error
SuccessfulKoala55 yes, I’m using a subdomain of my main domain for this
TimelyPenguin76 yes, I followed that guide. I got it up and running with my domain. But I’m not sure how to install SSL certificate for this
Hi CostlyOstrich36 basically, I'm using this code to create a pipeline https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
After I run the code. I can go to the ClearML server and click on "Pipelines" menu and see a new pipeline running, with a DAG graph showing all the steps.
Now I want to schedule it to run once a month. So in the main function, instead of calling execute_pipeline()
, I create a scheduler:
` from clearml.automation import ...
SuccessfulKoala55 Thanks Jake. Is t hat an option on the helm values or I set in the pipeline decorator?
CostlyOstrich36 Hi John, right now I define single queue. So you mean I can deploy multiple agents, each with different resource request and listen on different queue, right? I think that could work
CostlyOstrich36 Thank John! let me try that
Thanks, I'll try it out. I did not know if pipeline id and task id are same kind of id, or if the scheduler would know the difference.
CostlyOstrich36 I just follow the example code https://github.com/allegroai/clearml/blob/master/examples/scheduler/cron_example.py
and I installed the google package locally, so it's weird that it didnt detect it
Did you check in the "models" tab of the experiment? I see mine there.
Although I have a different problem. The MODEL URL is only the local path of the model file when it was build. ClearML does not automatically upload that file, so I don't know how to download the model file. CostlyOstrich36 do you have any suggesstion?