Badges 1518 × Eureka!
yeah but I see it gets enquequed to the
default which I don't know what it is connected to
If I execute this task using
python .....py will it execute the machine I executed it on?
Do i need to copy this aws scaler task to any project I want to have auto scaling on? what does it mean to enqueue hte aws scaler?
I don't fully get it - it says it has to be enqueued
I mean I don't get how all the pieces add up
Trains docs have at no point any mention on what should I do on the AWS interface... So I'm not sure at what point I should encounter this wizard
I'm going to play with it a bit and see if I can figure out how to make it work
What about permissions to the machines that are being spun up? For exampel if I want the instances to have specific permissions to read/write to S3 for example, how do I mange those?
Especially coming from the standpoint of a team leader or other kind of supervision (or anyone who wants to view the experiment which is not the code author), when looking at an experiment you want to see the actual code
I mean the code in whatever form it is - I'm working with git specifically, but if i have diffs I'd like to see the code with the diffs applied
eventually i think it should display the contents of the script executed in the most straightforward manner regardless of version control
As a part of a repo
Yep, the trains server is basically a docker-compose based service.
All you have to do is change the ports in the
If you followed the instructions in the docs you should find that file in
/opt/trains/docker-compose.yml and then you will see that there are multiple services (
redis etc.) and in each there might be a section called
ports which then states the mapping of the ports.
The number on the left, is ...
the level of configurability in this thing is one of the best I've seen
Oh... from the docs I understood that I don't have to run the script, that I can either configure it in the UI, or with the sscript (wizard) so I ignored it up until now
why does it deplete so fast?
what should I paste here to diagnose it?
Now I see the watermarks are 2gb
Increased to 20, lets see how long will it last 🙂
(it works now, with 20 GB)
Cool, now I understand the auto detection better
TimelyPenguin76 , this can safely be set to
Okay, so if my python script imports some other scripts I've written - I must use git?
You should try
trains-agent daemon --gpus device=0,1 --queue dual_gpu --docker --foreground and if it doesn't work try quoting
trains-agent daemon --gpus '"device=0,1"' --queue dual_gpu --docker --foreground
In standard docker TimelyPenguin76 this quoting you mentioned is wrong, since the whole argument is being passed - hence the double tricky quotation I posted above
ClearML results page: `
Launching step: 2019-09-03_2021-01-25_choose_best
Launching step: 2019-10-23_2021-01-15_choose_best
Launching step: 2019-05-26_2020-12-26_choose_best
Launching step: 2019-07-15_2021-01-05_choose_best