DilapidatedParrot58

42 Questions, 205 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

186 × Eureka!

Answers 205

0 We Have A Use Case Where An Experiment Consists Of Multiple Docker Containers. For Example, One Container Works On Cpu Machine, Preprocesses Images And Puts Them Into Queue. The Second One (Main One) Resides On Gpu Machine, Reads Tensors And Targets From

yeah, we've used pipelines in other scenarios. might be a good fit here. thanks!

2 years ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

we have a baremetal server with ClearML agents, and sometimes there are hanging containers or containers that consume too much RAM. unless I explicitly add container name in container arguments, it will have a random name, which is not very convenient. it would be great if we could set default container name for each experiment (e.g., experiment id)

3 years ago

0 Is There Any Way To Export Csv With Max Metrics And Hyperparameters For Selected Experiments?

kind of

3 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

standalone-mode gives me "Could not freeze installed packages"

4 years ago

0 Hey Guys, I Am Trying To Plan What I Need To Do In Order To Efficiently Use Clearml With Spot Instances 1) Detecting When Spot Instance Is Down And Experiment Is Aborted 2) Extracting S3 Address Of The Latest Checkpoint From Clearml Api 3) Starting New E

nice! exactly what I need, thank you!

3 years ago

0 Hey Guys! I'Ve Got The Latest Version Of Trains 0.16.0 And Now I Have A Problem. In Previous Versions I Could Easily Override Default Arguments On Hyperparameters Tab And Now After Editing The Arguments Values With The New Ones And Executing The Experime

it prints an empty dict

I’m doing Task.init() in the script, maybe it somehow resets connected parameters… but it used to work before, weird

4 years ago

ValueError: Task has no hyperparams section defined

4 years ago

nope, same problem even after creating a new experiment from scratch

4 years ago

weird
this is what I got in installed packages without adding the direct link:
torch==1.6.0.dev20200430+cu101
torchvision==0.7.0.dev20200430+cu101

4 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I decided to restart the containers one more time, this is what I got.

I had to restart Docker service to remove the containers

4 years ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

not necessarily, there are rare cases when container keeps running after experiment is stopped or aborted

will do!

3 years ago

0 Hey Guys, A Question About Monthly Worker_Stats Indices Each Of Them Takes Up About 1Gb For Us. Do We Really Need To Keep All Of Them? Is There Any Way To Free Up The Space?

yeah, backups take much longer, and we had to increase our EC2 instance volume size twice because of these indices

got it, thanks, will try to delete older ones

4 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

hmmm allegroai/trains:latest whatever it is

4 years ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

on the side note, is there any way to automatically give more meaningful names to the running docker containers?

3 years ago

0 Any Chance Storagemanager Could Re-Download Files Only If Their Size Is Different From File In Cache (As An Option)?

yeah, I was thinking mainly about AWS. we use force to make sure we are using the correct latest checkpoint, but this increases costs when we are running a lot of experiments

3 years ago

0 Any Chance Storagemanager Could Re-Download Files Only If Their Size Is Different From File In Cache (As An Option)?

will do

3 years ago

0 Anyone Having Problems With Clearml Slowing Down Pytorch Experiments? Auto_Connect_Framework={“Pytorch”: False} Helps, But It’S Not A Great Solution. We Think It’S Related To Clearml Trying To Do Something At Each Dataloader Iteration. We’Ll Try To Provid

we’re using latest ClearML server and client version (1.2.0)

2 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

this is how the interface looks

4 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

it will probably screw up my resource monitoring plots, but well, who cares 😃

3 years ago

0 We Can’T Add Overview To The Subprojects (Btw Thank You So Much For Subprojects, This Is Probably The Best Feature Ever Introduced To Trains/Clearml). Is It Intended? When I Click Overview For The Subproject, It Just Shows An Empty Page Without Any Button

perhaps it’s happening because it’s an old project that was moved to the new root project?

3 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I'll get back to you with the logs when the problem occurs again

4 years ago

I change the arguments in Web UI, but it looks like they are not parsed by trains

4 years ago

same here, changing arguments in the Args section of Hyperparameters doesn’t work, training script starts with the default values.

trains 0.16.0
trains-agent 0.16.0
trains-server 0.16.0

4 years ago

0 Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

this definitely would be a nice addition. number of hyperparameters in our models often goes up to 100

4 years ago

0 Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

copy-pasting entire training command into command line 😃

4 years ago

docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.

awesome news 👍

4 years ago

0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

we're using the latest version of clearml, clearml agent and clearml server, but we've been using trains/clearml for 2.5 years, so there are some old tasks left, I guess 😃

3 years ago

0 Hey Everyone, There'S A Bug That We Experience After Moving To The New Server And Domain. If You Click On The Experiment Name While Viewing Its Details, You Get A 404 Error Because There'S Missing "Experiments" Part In The Address. Details In The Thread

just updated, problem persists

2 years ago

0 Yo Guys, I'M Getting

yeah, that's exactly what I'm looking to right now 😃

4 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

the weird part is that the old job continues running when I recreate the worker and enqueue the new job

4 years ago

Show more results