
Reputation
Badges 1
380 × Eureka!IF there’s a post-task script, I can add a way to zip and upload pip cache etc to s3 - as in do any caching that I want without having first class support in clearml
Would this be a good use case to have?
As in if there are jobs, first level is new pods, second level is new nodes in the cluster.
Sagemaker will make that easy, especially if I have sagemaker as the long tail choice. Granted at a higher cost
Running multiple k8s_daemon rightt? k8s_daemon("1xGPU")
and k8s_daemon('cpu')
right?
Yeah mostly. With k8s glue going, want to finally look at clearml-session and how people are using it.
Got it. Never ran GPU workload in EKS before. Do you have any experience and things to watch out for?
Would adding support for some sort of post task script help? Is something already there?
I am on 1.0.3
Would be good to have frequentish releases if possible 🙂
Is there a published package version for these?
Any updates on trigger and schedule docs 🙂
It might be better suited than execute remotely for your specific workflowÂ
Exactly
Any chance you can open a github issue on it?
Will do!
do not limit the clone on execute_remotely,
Yes
Ok couldn’t see it in the docs - https://clear.ml/docs/latest/docs/references/sdk/task
Planning to exec into the container and run it in a loop and see what happens
Nope, that doesn’t seem to be it. Will debug a bit more.
Was able to use ScriptRequirements
and get what I need. thanks!
This is for building my model package for inference
AgitatedDove14 - added it in bucket_config.py and sdk.conf but somehow value is not being picked up
Pushed the changes, not sure if it’s fully right. Do let me know. But functionality is working
It completed after the max_job limit (10)
Ah ok there’s only optimizer.stop in the example