Reputation
Badges 1
533 × Eureka!You should try trains-agent daemon --gpus device=0,1 --queue dual_gpu --docker --foreground and if it doesn't work try quoting trains-agent daemon --gpus '"device=0,1"' --queue dual_gpu --docker --foreground
which permissions should it have? I would like to avoid full EC2 access if possible, and only choose the necessary permissions
This is what I meant should be documented - the permissions...
I re-executed the experiemnt, nothing changes
we are running the agent on the same machine AgitatedDove14 , it worked before upgrading the clearml... we never set these credentials
Okay, looks interesting but actually there is no final task, this is the pipeline layout
BTW is the if not cached_file: return cached_file is legit or a bug?
AgitatedDove14
So nope, this doesn't solve my case, I'll explain the full use case from the beginning.
I have a pipeline controller task, which launches 30 tasks. Semantically there are 10 applications, and I run 3 tasks for each (those 3 are sequential, so in the UI it looks like 10 lines of 3 tasks).
In one of those 3 tasks that run for every app, I save a dataframe under the name "my_dataframe".
What I want to achieve is once all tasks are over, to collect all those "my_dataframe" arti...
AgitatedDove14 since this is a powerful feature, I think this should be documented. I'm at a point where I want to use the AWS autoscaler and i'm not sure how.
I see in the docs that I need to supply the access+secret keys, which are associated with an IAM, but nowhere does it say what permissions does this IAM need in order to execute.
Also using the name "AWS Autoscaler" immediately suggests that behind the scene, trains uses the https://docs.aws.amazon.com/autoscaling/ec2/userguide/wha...
If the credentials don't have access tothe autoscale service obviously it won't work
I'm using pipe.start_locally so I imagine I don't have to .wait() right?
AgitatedDove14 β¬ please help π
when spinning up the ami i just went for trains recommended settings
Thanks very much
Now something else is failing, but I'm pretty sure its on my side now... So have a good day and see you in the next question π
working directory is the same relative to the script, but the absolute path is different as the github action creates a new environment for it and deletes it after
actually i was thinking about model that werent trained uaing clearml, like pretrained models etc
I want to get the instances of the tasks executed by this controller task
