Reputation
Badges 1
533 × Eureka!my current version of the images used:
I'd go for
` from trains.utilities.pyhocon import ConfigFactory
config = ConfigFactory.parse_file(CONF_FILE_PATH) `
I'm quite confused... The package is not missing, it is in my environment and executing tasks normally ( python my_script.py.... ) works
Any news on this? This is kind of creepy, it's something so basic that I can't trust my prediction pipeline because sometimes it fails randomly with no reason
Does that mean that teh AWS autoscaler in trains, manages EC2 auto scaling directly without using the AWS built in EC2 auto scaler?
TimelyPenguin76 , this can safely be set to s3:// right?
AgitatedDove14 clearml version on the Cleanup Service is 0.17.0
TimelyPenguin76 this fixed it, using the detect_with_pip_freeze as true solves the issue
So the scale will also appear?
I'm really confused, I'm not sure what is wrong and what is the relationship between the templates the agent and all of those thing
For the meantime, I'm giving up on the pipeline thing and I'll write a bash script to orchestrate the execution, because I need to deliver and I'm not feeling this is going anywhere
On an end note I'd love for this to work as expected, I'm not sure what you need from me. A fully reproducible example will be hard because obviously this is proprietary code. What ...
and in the UI configuration I didn't understand where does permission management came into play
You should try trains-agent daemon --gpus device=0,1 --queue dual_gpu --docker --foreground and if it doesn't work try quoting trains-agent daemon --gpus '"device=0,1"' --queue dual_gpu --docker --foreground
which permissions should it have? I would like to avoid full EC2 access if possible, and only choose the necessary permissions
This is what I meant should be documented - the permissions...
I re-executed the experiemnt, nothing changes
Okay, looks interesting but actually there is no final task, this is the pipeline layout
BTW is the if not cached_file: return cached_file is legit or a bug?
AgitatedDove14
So nope, this doesn't solve my case, I'll explain the full use case from the beginning.
I have a pipeline controller task, which launches 30 tasks. Semantically there are 10 applications, and I run 3 tasks for each (those 3 are sequential, so in the UI it looks like 10 lines of 3 tasks).
In one of those 3 tasks that run for every app, I save a dataframe under the name "my_dataframe".
What I want to achieve is once all tasks are over, to collect all those "my_dataframe" arti...
AgitatedDove14 since this is a powerful feature, I think this should be documented. I'm at a point where I want to use the AWS autoscaler and i'm not sure how.
I see in the docs that I need to supply the access+secret keys, which are associated with an IAM, but nowhere does it say what permissions does this IAM need in order to execute.
Also using the name "AWS Autoscaler" immediately suggests that behind the scene, trains uses the https://docs.aws.amazon.com/autoscaling/ec2/userguide/wha...
If the credentials don't have access tothe autoscale service obviously it won't work
I'm using pipe.start_locally so I imagine I don't have to .wait() right?
AgitatedDove14 ⬆ please help 🙏
when spinning up the ami i just went for trains recommended settings
Thanks very much
Now something else is failing, but I'm pretty sure its on my side now... So have a good day and see you in the next question 😄
