DilapidatedParrot58

running
docker network prunebefore starting the containers kind of helped. I still see an error when I'm comparing > 20 experiments, but at least trains works okay after that, and there are no connection pool limit errors in the logs

5 years ago

0 Is Is Possible To Pass Custom

this is probably what I need, thanks. I'll check if it works

3 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

I added the link just in case anyway 😃

also, is there any way to install a repo that we clone as a package. we often use absolute imports and do "pip install -e ." to utilize it
sorry there are so many questions, we just really want to migrate to trains-agent)

5 years ago

btw, are there any examples of exporting metrics using Python client? I could only find last_metrics attribute of the task

5 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

I use Docker for training, which means that log_dir contents are removed for the continued experiment btw

4 years ago

0 Hey Guys, I Keep Getting

thank you 😃

4 years ago

0 Here I Am Again... Can'T Find How To Create A Custom Queue

LOL
wow 😃
I was trying to find how to create a queue using CLI 😃

5 years ago

0 Hey Guys, I Keep Getting

new version worked

4 years ago

0 Is There Any Way To Post Slack Alerts For The Frozen Experiments? (Eg, After Server Restart They Sometimes Get Stuck In Running Mode, Or

yeah, that sounds right! thanks, will try

4 years ago

some of the POST requests "tasks.get_all_ex" fail as far as I can see

5 years ago

0 It Would Be Nice To Group Experiments Within Projects Use Cases:

nope, that's the point, quite often we run experiments separately, but they are related to each other. currently there's no way to see that one experiment is using checkpoint from the previous experiment since we need to manually insert S3 link as a hyperparameter. it would be useful to see these connections. maybe instead of grouping we could see which experiments are using artifacts of this experiment

3 years ago

0 Yo Guys, I'M Getting

yeah, that's exactly what I'm looking to right now 😃

5 years ago

0 We Have A Use Case Where An Experiment Consists Of Multiple Docker Containers. For Example, One Container Works On Cpu Machine, Preprocesses Images And Puts Them Into Queue. The Second One (Main One) Resides On Gpu Machine, Reads Tensors And Targets From

yeah, we've used pipelines in other scenarios. might be a good fit here. thanks!

3 years ago

0 Hi

we've already restarted everything, so I don't have any logs on hands right now. I'll let you know if we face any problems 😃 slack bot works though! 🎉

5 years ago

0 Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

this definitely would be a nice addition. number of hyperparameters in our models often goes up to 100

5 years ago

okay, give me a sec

5 years ago

tail of the api server log

5 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

that's right, I have 4 GPUs and 4 workers. but what if I want to run two jobs simultaneously at the same GPU

5 years ago

0 Hey Guys! I'Ve Got The Latest Version Of Trains 0.16.0 And Now I Have A Problem. In Previous Versions I Could Easily Override Default Arguments On Hyperparameters Tab And Now After Editing The Arguments Values With The New Ones And Executing The Experime

I change the arguments in Web UI, but it looks like they are not parsed by trains

5 years ago

0 Is Is Possible To Pass Custom

nice!

3 years ago

btw, there are "[2020-09-02 15:15:40,331] [9] [WARNING] [urllib3.connectionpool] Connection pool is full, discarding connection: elasticsearch" in the apiserver logs again

5 years ago

0 Anyone Having Problems With Clearml Slowing Down Pytorch Experiments? Auto_Connect_Framework={“Pytorch”: False} Helps, But It’S Not A Great Solution. We Think It’S Related To Clearml Trying To Do Something At Each Dataloader Iteration. We’Ll Try To Provid

we’re using latest ClearML server and client version (1.2.0)

3 years ago

0 Yo Clearml Folks! How To Force-Reinstall Package From Github In Installed Packages? Tried Different Strategies (Using @Commit_Id, Versioning, Flag --Force-Reinstall), And It Keeps Saying That Requirement Is Already Satisfied (Old Version Of The Package Is

Requirement already satisfied (use --upgrade to upgrade): celsusutils==0.0.1

4 years ago

0 Hey Everyone, There'S A Bug That We Experience After Moving To The New Server And Domain. If You Click On The Experiment Name While Viewing Its Details, You Get A 404 Error Because There'S Missing "Experiments" Part In The Address. Details In The Thread

I'm not sure it's related to the domain switch since we upgraded to the newest ClearML server version at the same time

3 years ago

okay, so if there’s no workaround atm, should I create a Github issue?

4 years ago