DilapidatedParrot58

42 Questions, 205 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

186 × Eureka!

Questions 42
Answers 205

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Is Is Possible To Pass Custom

is is possible to pass custom https://clear.ml/docs/latest/docs/configs/env_vars/ to ClearML agents?

clearml

3 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

What Is The Right Way To Increase Number Of Retries When Using

what is the right way to increase number of retries when using StorageManager.get_local_copy?

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Two Annoying Visual Bugs In Clearml Server Ui After Latest Update:

two annoying visual bugs in ClearML Server UI after latest update: experiment status is still shown as “Aborted” after successful resetting until you refresh...

clearml

2 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey Guys, I Am Trying To Plan What I Need To Do In Order To Efficiently Use Clearml With Spot Instances 1) Detecting When Spot Instance Is Down And Experiment Is Aborted 2) Extracting S3 Address Of The Latest Checkpoint From Clearml Api 3) Starting New E

hey guys, I am trying to plan what I need to do in order to efficiently use ClearML with spot instances 1) detecting when spot instance is down and experimen...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey Guys, Thanks For Creating Slack Workspace, That'S Really Cool. Question - Are We Missing Smth Or Is Currently Not Possible To Pass S3 Credentials Via Env Variables? We Forked Trains And Added A Simple Fix (

hey guys, thanks for creating Slack workspace, that's really cool. question - are we missing smth or is currently not possible to pass S3 credentials via env...

clearml

5 years ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Hey Everyone, There'S A Bug That We Experience After Moving To The New Server And Domain. If You Click On The Experiment Name While Viewing Its Details, You Get A 404 Error Because There'S Missing "Experiments" Part In The Address. Details In The Thread

hey everyone, there's a bug that we experience after moving to the new server and domain. if you click on the experiment name while viewing its details, you ...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey Guys, A Question About Monthly Worker_Stats Indices Each Of Them Takes Up About 1Gb For Us. Do We Really Need To Keep All Of Them? Is There Any Way To Free Up The Space?

hey guys, a question about monthly worker_stats indices each of them takes up about 1gb for us. do we really need to keep all of them? is there any way to fr...

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Some Random Weird Feature Suggestions For The Future 1) It Would Be Great If You Could Export Key Experiment Data As Html Or Pdf Report 2) It Would Also Be Quite Nice To Have An Opportunity To Discuss Experiments In Trains Without Leaving The Web App 3)

some random weird feature suggestions for the future 1) it would be great if you could export key experiment data as html or pdf report 2) it would also be q...

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Clearml-Init Doesn'T Ask For Ports, And Our Server Exposes Ports That Are Different From Default Ones. It Would Be Great To Have An Option To Change Default Ports For Api, File And Web Servers, Otherwise Initialization Fails With Wrong Creds Error

clearml-init doesn't ask for ports, and our server exposes ports that are different from default ones. it would be great to have an option to change default ...

clearml

2 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

We Can’T Add Overview To The Subprojects (Btw Thank You So Much For Subprojects, This Is Probably The Best Feature Ever Introduced To Trains/Clearml). Is It Intended? When I Click Overview For The Subproject, It Just Shows An Empty Page Without Any Button

we can’t add overview to the subprojects (btw thank you SO MUCH for subprojects, this is probably the best feature ever introduced to trains/clearml). is it ...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Here I Am Again... Can'T Find How To Create A Custom Queue

here I am again... can't find how to create a custom queue

clearml

5 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hey Guys, I'M Experiencing Seemingly Random Problems With The Experiments. There Are 4 Gpus And 8 Workers (2 Workers Per Gpu) , And Sometimes Experiments Randomly Fail (Or Complete) In The Middle Of The Epoch Without Any Additional Info In The Logs. What

hey guys, I'm experiencing seemingly random problems with the experiments. there are 4 GPUs and 8 workers (2 workers per GPU) , and sometimes experiments ran...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hey Guys, Do You Have Any Tutorials Or Examples Of Intergration With Dvc?

hey guys, do you have any tutorials or examples of intergration with dvc?

clearml

5 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

hey guys, do you have any plans to add functionality to export training config with all hyperparameters to the different formats, such as training command li...

clearml

5 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hey Guys, Here I Am Again With Another Question

hey guys, here I am again with another question 😃 after the latest update, I’m getting this error when I’m trying to compare scalars for more than 10 experi...

clearml

5 years ago

0 Votes

29 Answers

2K Views

0 Votes 29 Answers 2K Views

I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

I'm using Tensorboard SummaryWriter to add scalar metrics for the experiment. if experiment crashed, and I want to continue it from checkpoint, for some reas...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Step 3 Task (

Step 3 Task ( https://github.com/allegroai/trains/blob/master/examples/pipeline/step3_train_model.py ) - Loads the processed data (from Step 2) and clearml a...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Feature Request: Clearml Prints Github Token In The Log, When There Is "Repository Not Found" Error. It Would Be Nice If Could Hide It

feature request: ClearML prints GitHub token in the log, when there is "repository not found" error. it would be nice if could hide it

clearml

4 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I updated trains-server today, and now it's very unstable, Web interface randomly stops working. anyone had the same problem? I've never had any problems wit...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Feature Request: We Have Several Servers With Multiple Gpus, And Atm We Have To Manually Check Which Gpu Has Enough Memory Before Queuing Each Experiment Into The Right Queue. It Would Be Cool If We Could Set Required Gpu Memory Parameter For Each Experim

feature request: we have several servers with multiple GPUs, and atm we have to manually check which GPU has enough memory before queuing each experiment int...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

when we train the models, we often choose checkpoint based on the validation accuracy, but test set accuracy (or specific class validation accuracy) is not n...

clearml

4 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Yo Clearml Folks! How To Force-Reinstall Package From Github In Installed Packages? Tried Different Strategies (Using @Commit_Id, Versioning, Flag --Force-Reinstall), And It Keeps Saying That Requirement Is Already Satisfied (Old Version Of The Package Is

yo clearml folks! how to force-reinstall package from github in Installed Packages? tried different strategies (using @COMMIT_ID, versioning, flag --force-re...

clearml

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Yo Guys, I'M Getting

yo guys, I'm getting Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to O...

clearml

5 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Any Chance Storagemanager Could Re-Download Files Only If Their Size Is Different From File In Cache (As An Option)?

any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

I Keep Getting Errors When Trying To Compare A Lot Of Experiments At The Same Time (>10). What'S Evern Worse Is That Trains Start Working Much Slower In General After These Attempts, The Only Way To Fix It Is To Restart The Whole Thing. Would Getting Bett

I keep getting errors when trying to compare a lot of experiments at the same time (>10). what's evern worse is that trains start working much slower in gene...

clearml

5 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Anyone Having Problems With Clearml Slowing Down Pytorch Experiments? Auto_Connect_Framework={“Pytorch”: False} Helps, But It’S Not A Great Solution. We Think It’S Related To Clearml Trying To Do Something At Each Dataloader Iteration. We’Ll Try To Provid

anyone having problems with ClearML slowing down pytorch experiments? auto_connect_framework={“pytorch”: False} helps, but it’s not a great solution. we thin...

pytorch

3 years ago

0 Votes

20 Answers

2K Views

0 Votes 20 Answers 2K Views

Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

hey guys, I'm trying to run an experiment using trains-agent. I have a custom Docker image with nightly versions of pytorch and our own library installed fro...

pytorch

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Is There Any Way To Post Slack Alerts For The Frozen Experiments? (Eg, After Server Restart They Sometimes Get Stuck In Running Mode, Or

is there any way to post Slack alerts for the frozen experiments? (eg, after server restart they sometimes get stuck in Running mode, or https://github.com/p...

clearml

4 years ago

0 Votes

14 Answers

2K Views

0 Votes 14 Answers 2K Views

Hey Guys The First Time I'M Seeing This Behavior I'M Adding A New User To /Opt/Trains/Config/Apiserver.Conf And Restarting The Containers. All Old Users Are Able To Log In, But Not The New One (Invalid User/Password Combination). Any Ideas?

hey guys the first time I'm seeing this behavior I'm adding a new user to /opt/trains/config/apiserver.conf and restarting the containers. all old users are ...

clearml

5 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

hey guys, I keep getting "Failed parsing task parameter" warning for the arguments such as this one: parser.add_argument( "--dataset_mean", type = float, nar...

clearml

4 years ago

Show more results

0 Yo Guys, I'M Getting

works like a charm! you guys are the best, as always =)

5 years ago

0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

we're using the latest version of clearml, clearml agent and clearml server, but we've been using trains/clearml for 2.5 years, so there are some old tasks left, I guess 😃

4 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

another stupid question - what is the proper way to delete a worker? so far I've been using pgrep to find the relevant PID 😃

5 years ago

0 After Recent Clearml Server Update, Whenever I Clone An Experiment, The Default Project For The Draft Copy Is The First Project In The List. Previously, It Would Be The Project Which I Am Cloning This Experiment From. This Was Much More Convenient. Is Thi

will do, thanks

2 years ago

0 It Would Be Nice To Group Experiments Within Projects Use Cases:

tags are somewhat fine for this, I guess, but there will be too many of them eventually, and they do not reflect sequential nature of the experiments

3 years ago

0 It Would Be Nice To Group Experiments Within Projects Use Cases:

that's right
for example, there are tasks A, B, C
we run multiple experiments for A, finetune some of them in separate tasks, then choose one or more best checkpoints, run some experiments for task B, choose the best experiment, and finally run task C

so we get a chain of tasks: A - A-ft - B- C

ClearML pipeline doesn't quite work here because we would like to analyze results of each step before starting next task

but it would be great to see predecessors of each experiment in the chain

3 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

the weird part is that the old job continues running when I recreate the worker and enqueue the new job

5 years ago

0 What Is The Right Way To Increase Number Of Retries When Using

isn't this parameter related to communication with ClearML Server? I'm trying to make sure that checkpoint will be downloaded from AWS S3 even if there are temporary connection problems

there's https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.TransferConfig parameter in boto3, but I'm not sure if there's an easy way to pass this parameter to StorageManager

3 years ago

0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

we already have cleanup service set up and running, so we should be good from now on

4 years ago

0 Yo Guys, I'M Getting

we're using EC2

5 years ago

0 Hi

wow, thanks, just updated our server!
can't seem to find these metrics snapshot plots =) how do I plot one?

4 years ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

on the side note, is there any way to automatically give more meaningful names to the running docker containers?

4 years ago

0 Hey Guys, I Keep Getting

WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.

trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?

http://OUR_IP:8081 http://OUR_IP:8080 http://apiserver:8008
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
`...

4 years ago

0 There Is Something Weird Going On With Console Log After Latest Updates Of Clearml Server. It Doesn'T Show The Latest Updates, Instead It Often Jumps To The Seemingly Random Parts Of The Console Output

nice, thanks

2 years ago

0 Is Is Possible To Pass Custom

we're using os.getenv in the script to get a value for these secrets

3 years ago

0 Hey Guys! I'Ve Got The Latest Version Of Trains 0.16.0 And Now I Have A Problem. In Previous Versions I Could Easily Override Default Arguments On Hyperparameters Tab And Now After Editing The Arguments Values With The New Ones And Executing The Experime

ValueError: Task has no hyperparams section defined

5 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

our GPUs are 48GB, so it's quite wasteful to only run one job per GPU
yeah, I'm aware of that, I would have to make sure they don't fail to infamous CUDA out of memory, but still

5 years ago

0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

what if cleanup service is launched using ClearML-Agent Services container (part of the ClearML server)? adding clearml.conf to the home directory doesn't help

4 years ago

0 Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

this definitely would be a nice addition. number of hyperparameters in our models often goes up to 100

5 years ago

0 What Is The Right Way To Increase Number Of Retries When Using

I'm not sure since names of these parameters do not match with boto3 names, and num_download_attempt is passed https://github.com/allegroai/clearml/blob/3d3a835435cc2f01ff19fe0a58a8d7db10fd2de2/clearml/storage/helper.py#L1439 as container.config.retries

3 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

okay, so if there’s no workaround atm, should I create a Github issue?

4 years ago

0 Is Is Possible To Pass Custom

works like a charm!