
Reputation
Badges 1
11 × Eureka!This hasn’t worked for me either, I use multiple queues instead. Another reason I also use multiple queues is because I need to specify different resource requirements for pods launched by each queue (CPU-only vs GPU).
@<1523701205467926528:profile|AgitatedDove14> I managed to fix the issue FYI. I replaced from clearml import PipelineDecorator
with from clearml.automation.controller import PipelineDecorator
and it suddenly works. What a weird issue.
Huh, I see. Thanks for your answers. How difficult would it be to implement some way to automatically inferring repository information for components, or having a flag repo_inherit
(or similar) when defining a component (which would inhering repository information from the controller)? My workflow is based around executing code that lives in the same repository, so it’s cumbersome having to specify repository information all over the place, and changing commit hash as I add new code.
no worries @<1523701205467926528:profile|AgitatedDove14>
@<1523701205467926528:profile|AgitatedDove14> for me it hasn’t worked when I specified agentk8sglue.queue: "queue1,queue2"
in the Helm chart options which should be possible according to documentation. What also hasn’t worked is that flag for creating a queue if it doesn’t exists ( agentk8sglue.createQueueIfNotExists
). Both failed parsing at runtime, so those are 2 bugs I’d say.
I think so, but haven’t investigated what is the problem exactly, I’ll report it though.
Yes, that seems like an option as well. I found this as well (in case someone looks for it in the future):
p = PipelineDecorator.get_current_pipeline()
p.get_running_nodes()
Thanks @<1806497735218565120:profile|BrightJellyfish46>
Here’s how I do it using clearml.conf
config for my agent:
sdk {
aws {
s3 {
...
}
}
development {
default_output_uri: "
"
}
}
the components start hanging indefinitely right after printing Starting Task Execution
when I add repo="."
to definition of all my component decorators it works (but not the pipeline decorator), but it doesn’t work without that part… the problem i’m having now is that my components hang when executed in the cluster… i have 2 agents deployed (default and services queues)
The way I understand it:
- if you’re executing tasks locally (e.g. on your laptop) then you need this setting because the
clearml
package needs to know where to upload artifacts (artifacts aren’t proxied through theclearml-server
they are rather uploaded directly to the storage of your choice) - if you’re executing code using ClearML agent, then you can configure agent the way I wrote earlier, and it will use your MinIO instance for uploading artifacts for all of the tasks it executes
I don’t use datasets so I don’t know, sorry, maybe @<1523701087100473344:profile|SuccessfulKoala55> can help
I know I can configure the “pod template”, but I’m looking for a solution where users can set their own variables without modifying Kubernetes secrets.
Any ideas @<1523701087100473344:profile|SuccessfulKoala55> ?