hi AgitatedDove14 do you mean I insert code in the clearml package itself?
The long story is I tried to create task scheduler and my clearml agent running on k8s. so the scheduler r un as a pod. But I found out the pod cant run for very long time. It may have been killed or evicted or something after a day or 2.
So Im thinking I might need to create my own scheduler. So I would like to to know what it send to the server to create the task/pipeline, so I can just replicate the http api request. Instead of pulling all the code and install package and run the python code.
? Do you have a link how to setup a task scheduler to run in service mode in k8s?
basically spin the agent pod and add an argument to the agent itself (this is the --service-mode)
https://clear.ml/docs/latest/docs/clearml_agent#services-mode
JuicyFox94 I'll need to check with my infra team on that. when the pod get killed, I cant access any log on my rancher end. On clearml server, it simply show the pod stop communcate with the server. no error
I see them run reliably (no killed), are they running in service mode?
How do you deploy agents, with the clearml k8s glue ?
AgitatedDove14 thank you. I'll try that.
I deployed the agents using helm
ok but describing the pod you should have, at least, the Ending cause
It may have been killed or evicted or something after a day or 2.
Actually the ideal setup is to have a "services" pod running all these service on a single pod, with clearml-agent --services-mode. This Pod should always be on and pull jobs from a dedicated queue.
Maybe a nice way to do that is to have the single Task serialize itself, then have the a Pod run the Task every X hours and spin it down
So I would like to to know what it send to the server to create the task/pipeline, so I can just replicate the http api request. Instead of pulling all the code and install package and run the python code.
Oh, I would just use the pythonic interface to do that, instead of the raw Rest API
AgitatedDove14 oh, when I deploy the agents on k8s (using helm), I see them run reliably (no killed), are they running in service mode? Do you have a link how to setup a task scheduler to run in service mode in k8s? is it similar to the clearml agent? (from my understanding, the agent also listen to a queue and spin a new pod to handle incoming tasks on the queue)
I'm not sure how the helm is built but do we have a "services queue" on the helm?
especially if it’s evicted, it should be due increasing resource usage
With Helm we are not running in service-mode. If pod get evicted or killed we should investigate what is the reason behind that; there are any logs on kille dpod that can help us understand better the situation?
Hi GrittyCormorant73
At the end everything goes through session.send, you can add a print there?
btw: why would you print all the requests? what are we debugging here?
I think JuicyFox94 is maintaining the clearml helm charts. Can we specify service mode for the helm chart? thank you