DilapidatedParrot58

42 Questions, 205 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

186 × Eureka!

Answers 205

0 Hey Guys! I'Ve Got The Latest Version Of Trains 0.16.0 And Now I Have A Problem. In Previous Versions I Could Easily Override Default Arguments On Hyperparameters Tab And Now After Editing The Arguments Values With The New Ones And Executing The Experime

I change the arguments in Web UI, but it looks like they are not parsed by trains

4 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

I use the following Base Docker image setting: my-docker-hub/my-repo:latest -v /data/project/data:/data

4 years ago

same here, changing arguments in the Args section of Hyperparameters doesn’t work, training script starts with the default values.

trains 0.16.0
trains-agent 0.16.0
trains-server 0.16.0

4 years ago

I updated the version in the Installed packages section before starting the experiment

4 years ago

0 Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

this definitely would be a nice addition. number of hyperparameters in our models often goes up to 100

5 years ago

0 When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

I guess, this could overcomplicate ui, I don't see a good solution yet.

as a quick hack, we can just use separate name (eg "best_val_roc_auc") for all metric values for the current best checkpoint. then we can just add columns with the last value of this metric

4 years ago

0 Hey Guys, I Keep Getting

thank you 😃

4 years ago

exactly

4 years ago

0 Hi

all our workers went down after starting the slack bot, is it expected?)

4 years ago

0 I Keep Getting Errors When Trying To Compare A Lot Of Experiments At The Same Time (>10). What'S Evern Worse Is That Trains Start Working Much Slower In General After These Attempts, The Only Way To Fix It Is To Restart The Whole Thing. Would Getting Bett

running
docker network prunebefore starting the containers kind of helped. I still see an error when I'm comparing > 20 experiments, but at least trains works okay after that, and there are no connection pool limit errors in the logs

4 years ago

I don’t connect anything explicitly, I’m using argparse, it used to work before the update

4 years ago

btw, there are "[2020-09-02 15:15:40,331] [9] [WARNING] [urllib3.connectionpool] Connection pool is full, discarding connection: elasticsearch" in the apiserver logs again

4 years ago

0 I'M Getting A Lot Of Errors When Running Cleanup Service

I updated S3 credentials, I'll check if they work later

it doesn't explain inability to delete logged images and texts though

3 years ago

we do log a lot of the different metrics, maybe this can be part of the problem

4 years ago

0 Hey Everyone, There'S A Bug That We Experience After Moving To The New Server And Domain. If You Click On The Experiment Name While Viewing Its Details, You Get A 404 Error Because There'S Missing "Experiments" Part In The Address. Details In The Thread

if you click on the experiment name here, you get 404 because link looks like this:
https://DOMAIN/projects/PROJECT_ID/EXPERIMENT_ID
when it should look like this:
https://DOMAIN/projects/PROJECT_ID/experiments/EXPERIMENT_ID

3 years ago

0 I’M Interested In Learning More About Internals Of Clearml Server - For Example, How Elasticsearch, Mongodb, And Redis Are Used Internally. Are There Any Materials Available?

got it, thanks!

2 years ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

not necessarily, there are rare cases when container keeps running after experiment is stopped or aborted

will do!

3 years ago

0 Feature Request: Clearml Prints Github Token In The Log, When There Is "Repository Not Found" Error. It Would Be Nice If Could Hide It

just DMed you a screenshot where you can see a part of the token

3 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

working on the logs

4 years ago

0 It Would Be Nice To Group Experiments Within Projects Use Cases:

parents and children. maybe tags, maybe separate tab or section, idk. I wonder if anyone else is interested in this functionality, for us this is a very common case

3 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I'll get back to you with the logs when the problem occurs again

4 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I decided to restart the containers one more time, this is what I got.

I had to restart Docker service to remove the containers

4 years ago

okay, give me a sec

4 years ago

0 Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

copy-pasting entire training command into command line 😃

5 years ago

0 Yo Guys, I'M Getting

nope

4 years ago

0 With Clearml 1.0 It Seems That Console Logs Are Only Shown In The Web Ui When The Task Has Finished. Is This Expected Behaviour? With Previous Versions I Was Able To See "Live" Output. I Tested This With The Pytorch_Tensorboardx.Py Example. I Run The Scri

yeah, server (1.0.0) and client (1.0.1)

3 years ago

0 Two Annoying Visual Bugs In Clearml Server Ui After Latest Update:

nice, thanks for the info

2 years ago

just updated, problem persists

3 years ago

0 Hey Guys, I Keep Getting

problem is solved. I had to replace /opt/trains/data/fileserver to /opt/clearml/data/fileserver in Agent configuration, and replace trains to clearml in Requirements

4 years ago

any suggestions on how to fix it?

4 years ago

Show more results