DilapidatedParrot58

42 Questions, 205 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

186 × Eureka!

Questions 42
Answers 205

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Downloading Output Artifacts From S3 By Clicking On The Download Button Next To Model Url Was Great, But Since We Moved From Aws To Yandex.Cloud, This Feature Doesn'T Work. Any Chance You Could Support Other Cloud Providers?

downloading output artifacts from S3 by clicking on the download button next to Model URL was great, but since we moved from AWS to Yandex.Cloud, this featur...

clearml

3 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Yo Guys, I'M Getting

yo guys, I'm getting Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to O...

clearml

5 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Anyone Having Problems With Clearml Slowing Down Pytorch Experiments? Auto_Connect_Framework={“Pytorch”: False} Helps, But It’S Not A Great Solution. We Think It’S Related To Clearml Trying To Do Something At Each Dataloader Iteration. We’Ll Try To Provid

anyone having problems with ClearML slowing down pytorch experiments? auto_connect_framework={“pytorch”: False} helps, but it’s not a great solution. we thin...

pytorch

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Step 3 Task (

Step 3 Task ( https://github.com/allegroai/trains/blob/master/examples/pipeline/step3_train_model.py ) - Loads the processed data (from Step 2) and clearml a...

clearml

4 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I updated trains-server today, and now it's very unstable, Web interface randomly stops working. anyone had the same problem? I've never had any problems wit...

clearml

5 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

I Keep Getting Errors When Trying To Compare A Lot Of Experiments At The Same Time (>10). What'S Evern Worse Is That Trains Start Working Much Slower In General After These Attempts, The Only Way To Fix It Is To Restart The Whole Thing. Would Getting Bett

I keep getting errors when trying to compare a lot of experiments at the same time (>10). what's evern worse is that trains start working much slower in gene...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Two Annoying Visual Bugs In Clearml Server Ui After Latest Update:

two annoying visual bugs in ClearML Server UI after latest update: experiment status is still shown as “Aborted” after successful resetting until you refresh...

clearml

3 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Clearml-Init Doesn'T Ask For Ports, And Our Server Exposes Ports That Are Different From Default Ones. It Would Be Great To Have An Option To Change Default Ports For Api, File And Web Servers, Otherwise Initialization Fails With Wrong Creds Error

clearml-init doesn't ask for ports, and our server exposes ports that are different from default ones. it would be great to have an option to change default ...

clearml

3 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Some Random Weird Feature Suggestions For The Future 1) It Would Be Great If You Could Export Key Experiment Data As Html Or Pdf Report 2) It Would Also Be Quite Nice To Have An Opportunity To Discuss Experiments In Trains Without Leaving The Web App 3)

some random weird feature suggestions for the future 1) it would be great if you could export key experiment data as html or pdf report 2) it would also be q...

clearml

5 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

What Is The Right Way To Increase Number Of Retries When Using

what is the right way to increase number of retries when using StorageManager.get_local_copy?

clearml

3 years ago

0 Votes

25 Answers

2K Views

0 Votes 25 Answers 2K Views

I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

I'm probably stupid, but how do I specify worker name? usecase - I want to create two workers using the same GPU, and new worker just overwrites the old one

clearml

5 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hey Guys, Here I Am Again With Another Question

hey guys, here I am again with another question 😃 after the latest update, I’m getting this error when I’m trying to compare scalars for more than 10 experi...

clearml

5 years ago

0 Votes

27 Answers

2K Views

0 Votes 27 Answers 2K Views

Hey Guys, I Keep Getting

hey guys, I keep getting trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

when we train the models, we often choose checkpoint based on the validation accuracy, but test set accuracy (or specific class validation accuracy) is not n...

clearml

4 years ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Hey Everyone, There'S A Bug That We Experience After Moving To The New Server And Domain. If You Click On The Experiment Name While Viewing Its Details, You Get A 404 Error Because There'S Missing "Experiments" Part In The Address. Details In The Thread

hey everyone, there's a bug that we experience after moving to the new server and domain. if you click on the experiment name while viewing its details, you ...

clearml

3 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hey Guys, Do You Have Any Tutorials Or Examples Of Intergration With Dvc?

hey guys, do you have any tutorials or examples of intergration with dvc?

clearml

5 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Yo Clearml Folks! How To Force-Reinstall Package From Github In Installed Packages? Tried Different Strategies (Using @Commit_Id, Versioning, Flag --Force-Reinstall), And It Keeps Saying That Requirement Is Already Satisfied (Old Version Of The Package Is

yo clearml folks! how to force-reinstall package from github in Installed Packages? tried different strategies (using @COMMIT_ID, versioning, flag --force-re...

clearml

4 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

hey guys, I keep getting "Failed parsing task parameter" warning for the arguments such as this one: parser.add_argument( "--dataset_mean", type = float, nar...

clearml

4 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

hey guys, is there a ready script that can delete all models from S3 (or other storage) that are related to deleted or archived experiments?

clearml

4 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

It Would Be Nice To Group Experiments Within Projects Use Cases:

it would be nice to group experiments within projects use cases: hyperparameter sweep (10 experiments with different learning rate) finetuning models (for ex...

clearml

3 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

There Is Something Weird Going On With Console Log After Latest Updates Of Clearml Server. It Doesn'T Show The Latest Updates, Instead It Often Jumps To The Seemingly Random Parts Of The Console Output

there is something weird going on with console log after latest updates of ClearML Server. it doesn't show the latest updates, instead it often jumps to the ...

clearml

2 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

hey guys, do you have any plans to add functionality to export training config with all hyperparameters to the different formats, such as training command li...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Is There Any Way To Export Csv With Max Metrics And Hyperparameters For Selected Experiments?

is there any way to export CSV with max metrics and hyperparameters for selected experiments?

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

I'M Getting A Lot Of Errors When Running Cleanup Service

I'm getting A LOT of errors when running cleanup service Failed deleting the following URIs - script fails to delete image and text files ERROR - Failed dele...

clearml

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Feature Request: Clearml Prints Github Token In The Log, When There Is "Repository Not Found" Error. It Would Be Nice If Could Hide It

feature request: ClearML prints GitHub token in the log, when there is "repository not found" error. it would be nice if could hide it

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Any Chance Storagemanager Could Re-Download Files Only If Their Size Is Different From File In Cache (As An Option)?

any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?

clearml

4 years ago

0 Votes

14 Answers

2K Views

0 Votes 14 Answers 2K Views

Hey Guys The First Time I'M Seeing This Behavior I'M Adding A New User To /Opt/Trains/Config/Apiserver.Conf And Restarting The Containers. All Old Users Are Able To Log In, But Not The New One (Invalid User/Password Combination). Any Ideas?

hey guys the first time I'm seeing this behavior I'm adding a new user to /opt/trains/config/apiserver.conf and restarting the containers. all old users are ...

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Here I Am Again... Can'T Find How To Create A Custom Queue

here I am again... can't find how to create a custom queue

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Is There Any Way To Post Slack Alerts For The Frozen Experiments? (Eg, After Server Restart They Sometimes Get Stuck In Running Mode, Or

is there any way to post Slack alerts for the frozen experiments? (eg, after server restart they sometimes get stuck in Running mode, or https://github.com/p...

clearml

4 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

We Just Had A Slight Problem - There Was A Double Space In S3 Checkpoint Name, But Clearml Ui Prints Them As One In The Model Description. If You Copy And Paste It, The Address Will Be Wrong

we just had a slight problem - there was a double space in S3 checkpoint name, but ClearML UI prints them as one in the model description. if you copy and pa...

clearml

2 years ago

Show more results

0 Hi

python3 slack_alerts.py --channel trains-alerts --slack_api "OUR_KEY" --include_completed_experiments --include_manual_experiments

5 years ago

0 Hey Guys! I'Ve Got The Latest Version Of Trains 0.16.0 And Now I Have A Problem. In Previous Versions I Could Easily Override Default Arguments On Hyperparameters Tab And Now After Editing The Arguments Values With The New Ones And Executing The Experime

nope, same problem even after creating a new experiment from scratch

5 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

no, I even added the argument to specify tensorboard log_dir to make sure this is not happening

5 years ago

0 Downloading Output Artifacts From S3 By Clicking On The Download Button Next To Model Url Was Great, But Since We Moved From Aws To Yandex.Cloud, This Feature Doesn'T Work. Any Chance You Could Support Other Cloud Providers?

after the very first click, there is a popup with credentials request. nothing happens after that

3 years ago

0 Hello, I Have A Problem With Task.Set_Initial_Iteration(0) In Google Colab. After Continuing The Experiment, Gaps Appear On My Graph, But If You Use Colab. I Tried It On My Computer And Everything Is Normal There.

I'm so happy to see that this problem has been finally solved!

3 years ago

I don’t connect anything explicitly, I’m using argparse, it used to work before the update

5 years ago

I updated the version in the Installed packages section before starting the experiment

5 years ago

0 Hey Guys, I'M Experiencing Seemingly Random Problems With The Experiments. There Are 4 Gpus And 8 Workers (2 Workers Per Gpu) , And Sometimes Experiments Randomly Fail (Or Complete) In The Middle Of The Epoch Without Any Additional Info In The Logs. What

example of the failed experiment

5 years ago

0 Step 3 Task (

https://allegro.ai/clearml/docs/docs/examples/frameworks/pytorch/notebooks/table/tabular_training_pipeline.html

4 years ago

0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

oh wow, I didn't see delete_artifacts_and_models option

I guess we'll have to manually find old artifacts that are related to already deleted tasks

4 years ago

thanks for the link advice, will do
I'll let you know if I managed to achieve my goals with StorageManager

5 years ago

0 Hey Everyone, There'S A Bug That We Experience After Moving To The New Server And Domain. If You Click On The Experiment Name While Viewing Its Details, You Get A 404 Error Because There'S Missing "Experiments" Part In The Address. Details In The Thread

just updated, problem persists

3 years ago

0 Some Random Weird Feature Suggestions For The Future 1) It Would Be Great If You Could Export Key Experiment Data As Html Or Pdf Report 2) It Would Also Be Quite Nice To Have An Opportunity To Discuss Experiments In Trains Without Leaving The Web App 3)

1 - yes, of course =) but it would be awesome if you could customize the content - to include key metrics and hyperparameters, for example
3 - hooooooraaaay

5 years ago

nice idea, thanks

5 years ago

same here, changing arguments in the Args section of Hyperparameters doesn’t work, training script starts with the default values.

trains 0.16.0
trains-agent 0.16.0
trains-server 0.16.0

5 years ago

0 Yo Guys, I'M Getting

everything is working as expected

5 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine values I would like to continue from the latest iteration

but for the metrics, I explicitly pass the number of epoch that my training is currently on. it'ls kind of weird that it adds offset to the values that are explicitly reported, no?

4 years ago

nope, didn't work =(

4 years ago

thank you, I'll let you know if setting it to zero worked

4 years ago

sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the ass 😃

3 years ago

0 Hey Guys, I Keep Getting

default docker-compose

4 years ago

0 There Is Something Weird Going On With Console Log After Latest Updates Of Clearml Server. It Doesn'T Show The Latest Updates, Instead It Often Jumps To The Seemingly Random Parts Of The Console Output

nice, thanks

2 years ago

0 Hey Guys The First Time I'M Seeing This Behavior I'M Adding A New User To /Opt/Trains/Config/Apiserver.Conf And Restarting The Containers. All Old Users Are Able To Log In, But Not The New One (Invalid User/Password Combination). Any Ideas?

0.15.0

5 years ago

0 Hey Guys, Here I Am Again With Another Question

we often do ablation studies with more than 50 experiments, and it was very convenient to compare their dynamics at the different epochs

5 years ago

0 Hey Guys, I Keep Getting

WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.

trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?

http://OUR_IP:8081 http://OUR_IP:8080 http://apiserver:8008
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
`...

4 years ago

okay, I will open an issue

4 years ago

it will probably screw up my resource monitoring plots, but well, who cares 😃

4 years ago

https://github.com/allegroai/clearml/issues/496

3 years ago

still no luck, I tried everything =( any updates?

4 years ago

docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.

awesome news 👍

5 years ago

Show more results