I'm really confused, I'm not sure what is wrong and what is the relationship between the templates the agent and all of those thing
For the meantime, I'm giving up on the pipeline thing and I'll write a bash script to orchestrate the execution, because I need to deliver and I'm not feeling this is going anywhere
On an end note I'd love for this to work as expected, I'm not sure what you need from me. A fully reproducible example will be hard because obviously this is proprietary code. What ...
and in the UI configuration I didn't understand where does permission management came into play
You should try trains-agent daemon --gpus device=0,1 --queue dual_gpu --docker --foreground and if it doesn't work try quoting trains-agent daemon --gpus '"device=0,1"' --queue dual_gpu --docker --foreground
which permissions should it have? I would like to avoid full EC2 access if possible, and only choose the necessary permissions
This is what I meant should be documented - the permissions...
I re-executed the experiemnt, nothing changes
Okay, looks interesting but actually there is no final task, this is the pipeline layout
AgitatedDove14
So nope, this doesn't solve my case, I'll explain the full use case from the beginning.
I have a pipeline controller task, which launches 30 tasks. Semantically there are 10 applications, and I run 3 tasks for each (those 3 are sequential, so in the UI it looks like 10 lines of 3 tasks).
In one of those 3 tasks that run for every app, I save a dataframe under the name "my_dataframe".
What I want to achieve is once all tasks are over, to collect all those "my_dataframe" arti...
AgitatedDove14 since this is a powerful feature, I think this should be documented. I'm at a point where I want to use the AWS autoscaler and i'm not sure how.
I see in the docs that I need to supply the access+secret keys, which are associated with an IAM, but nowhere does it say what permissions does this IAM need in order to execute.
Also using the name "AWS Autoscaler" immediately suggests that behind the scene, trains uses the https://docs.aws.amazon.com/autoscaling/ec2/userguide/wha...
If the credentials don't have access tothe autoscale service obviously it won't work
I'm using pipe.start_locally so I imagine I don't have to .wait() right?
AgitatedDove14 β¬ please help π
when spinning up the ami i just went for trains recommended settings
Thanks very much
Now something else is failing, but I'm pretty sure its on my side now... So have a good day and see you in the next question π
working directory is the same relative to the script, but the absolute path is different as the github action creates a new environment for it and deletes it after
actually i was thinking about model that werent trained uaing clearml, like pretrained models etc
Very nice thanks, I'm going to try the SA server + agents setup this week, let's see how it goes β
