DilapidatedParrot58

42 Questions, 205 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

186 × Eureka!

Questions 42
Answers 205

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Step 3 Task (

Step 3 Task ( https://github.com/allegroai/trains/blob/master/examples/pipeline/step3_train_model.py ) - Loads the processed data (from Step 2) and clearml a...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Two Annoying Visual Bugs In Clearml Server Ui After Latest Update:

two annoying visual bugs in ClearML Server UI after latest update: experiment status is still shown as “Aborted” after successful resetting until you refresh...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey Guys, I Am Trying To Plan What I Need To Do In Order To Efficiently Use Clearml With Spot Instances 1) Detecting When Spot Instance Is Down And Experiment Is Aborted 2) Extracting S3 Address Of The Latest Checkpoint From Clearml Api 3) Starting New E

hey guys, I am trying to plan what I need to do in order to efficiently use ClearML with spot instances 1) detecting when spot instance is down and experimen...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Is There Any Way To Post Slack Alerts For The Frozen Experiments? (Eg, After Server Restart They Sometimes Get Stuck In Running Mode, Or

is there any way to post Slack alerts for the frozen experiments? (eg, after server restart they sometimes get stuck in Running mode, or https://github.com/p...

clearml

4 years ago

0 Votes

14 Answers

2K Views

0 Votes 14 Answers 2K Views

Hey Guys The First Time I'M Seeing This Behavior I'M Adding A New User To /Opt/Trains/Config/Apiserver.Conf And Restarting The Containers. All Old Users Are Able To Log In, But Not The New One (Invalid User/Password Combination). Any Ideas?

hey guys the first time I'm seeing this behavior I'm adding a new user to /opt/trains/config/apiserver.conf and restarting the containers. all old users are ...

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey Guys, Thanks For Creating Slack Workspace, That'S Really Cool. Question - Are We Missing Smth Or Is Currently Not Possible To Pass S3 Credentials Via Env Variables? We Forked Trains And Added A Simple Fix (

hey guys, thanks for creating Slack workspace, that's really cool. question - are we missing smth or is currently not possible to pass S3 credentials via env...

clearml

5 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Anyone Having Problems With Clearml Slowing Down Pytorch Experiments? Auto_Connect_Framework={“Pytorch”: False} Helps, But It’S Not A Great Solution. We Think It’S Related To Clearml Trying To Do Something At Each Dataloader Iteration. We’Ll Try To Provid

anyone having problems with ClearML slowing down pytorch experiments? auto_connect_framework={“pytorch”: False} helps, but it’s not a great solution. we thin...

pytorch

3 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I updated trains-server today, and now it's very unstable, Web interface randomly stops working. anyone had the same problem? I've never had any problems wit...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Is There Any Way To Export Csv With Max Metrics And Hyperparameters For Selected Experiments?

is there any way to export CSV with max metrics and hyperparameters for selected experiments?

clearml

4 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

We Can’T Add Overview To The Subprojects (Btw Thank You So Much For Subprojects, This Is Probably The Best Feature Ever Introduced To Trains/Clearml). Is It Intended? When I Click Overview For The Subproject, It Just Shows An Empty Page Without Any Button

we can’t add overview to the subprojects (btw thank you SO MUCH for subprojects, this is probably the best feature ever introduced to trains/clearml). is it ...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

There Is Something Weird Going On With Console Log After Latest Updates Of Clearml Server. It Doesn'T Show The Latest Updates, Instead It Often Jumps To The Seemingly Random Parts Of The Console Output

there is something weird going on with console log after latest updates of ClearML Server. it doesn't show the latest updates, instead it often jumps to the ...

clearml

2 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

when we train the models, we often choose checkpoint based on the validation accuracy, but test set accuracy (or specific class validation accuracy) is not n...

clearml

4 years ago

Show more results

0 When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

I don't think so because max value of each metric is calculated independently of other metrics

4 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

the weird part is that the old job continues running when I recreate the worker and enqueue the new job

5 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

I use the following Base Docker image setting: my-docker-hub/my-repo:latest -v /data/project/data:/data

5 years ago

0 With Clearml 1.0 It Seems That Console Logs Are Only Shown In The Web Ui When The Task Has Finished. Is This Expected Behaviour? With Previous Versions I Was Able To See "Live" Output. I Tested This With The Pytorch_Tensorboardx.Py Example. I Run The Scri

same here, no live logs

4 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

this would be great. I could just then pass it as a hyperparameter

4 years ago

great, this helped, thanks! I simply added https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html to trains.conf, and it seems to be working

I now have another problem, my code is looking for some additional files in the root folder of the project. I tried adding a Docker layer:
ADD file.pkl /root/.trains/venvs-builds/3.6/task_repository/project.git/extra_data/

but trains probably rewrites the folder when cloning the repo. is there any workaround?

5 years ago

task = Task.get_task(task_id = args.task_id)
task.mark_started()
task.set_parameters_as_dict(
{
"General": {
"checkpoint_file": model.url,
"restart_optimizer": False,
}
}
)
task.set_initial_iteration(0)
task.mark_stopped()
Task.enqueue(task = task, queue_name = task.data.execution.queue)

4 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

thanks! I need to read all parts of documentation really carefully =) for some reason, couldn't find this section

5 years ago

maybe I should use explicit reporting instead of Tensorboard

4 years ago

0 Hi

new icons are slick, it would be even better if you could upload custom icons for the different projects

5 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

our GPUs are 48GB, so it's quite wasteful to only run one job per GPU
yeah, I'm aware of that, I would have to make sure they don't fail to infamous CUDA out of memory, but still

5 years ago

so max values that I get can be reached at the different epochs

4 years ago

0 Hey Guys, I Keep Getting

well, the server wouldn't work without them?

4 years ago

perhaps I need to do task.set_initial_iteration(0)?

4 years ago

0 We Just Had A Slight Problem - There Was A Double Space In S3 Checkpoint Name, But Clearml Ui Prints Them As One In The Model Description. If You Copy And Paste It, The Address Will Be Wrong

sounds like an overkill for this problem, but I don’t see any other pretty solution 😃

2 years ago

0 We Just Had A Slight Problem - There Was A Double Space In S3 Checkpoint Name, But Clearml Ui Prints Them As One In The Model Description. If You Copy And Paste It, The Address Will Be Wrong

thanks! we copy S3 URLs quite often. I know that it’s better to avoid double spaces in task names, but shit happens 😄

2 years ago

0 I Keep Getting Errors When Trying To Compare A Lot Of Experiments At The Same Time (>10). What'S Evern Worse Is That Trains Start Working Much Slower In General After These Attempts, The Only Way To Fix It Is To Restart The Whole Thing. Would Getting Bett

we do log a lot of the different metrics, maybe this can be part of the problem

5 years ago

0 Yo Clearml Folks! How To Force-Reinstall Package From Github In Installed Packages? Tried Different Strategies (Using @Commit_Id, Versioning, Flag --Force-Reinstall), And It Keeps Saying That Requirement Is Already Satisfied (Old Version Of The Package Is

okay, what do I do if it IS installed?

4 years ago

does this mean that setting initial iteration to 0 should help?

4 years ago

0 Feature Request: Clearml Prints Github Token In The Log, When There Is "Repository Not Found" Error. It Would Be Nice If Could Hide It

just DMed you a screenshot where you can see a part of the token

4 years ago

0 Hey Guys, I Keep Getting

magic

4 years ago

0 Hey Guys! I'Ve Got The Latest Version Of Trains 0.16.0 And Now I Have A Problem. In Previous Versions I Could Easily Override Default Arguments On Hyperparameters Tab And Now After Editing The Arguments Values With The New Ones And Executing The Experime

ValueError: Task has no hyperparams section defined

5 years ago

0 Hey Guys, I Keep Getting

problem is solved. I had to replace /opt/trains/data/fileserver to /opt/clearml/data/fileserver in Agent configuration, and replace trains to clearml in Requirements

4 years ago

0 Hi

we are working on the medical projects, so probably images of the different body parts 😃

5 years ago

0 Is There Any Way To Post Slack Alerts For The Frozen Experiments? (Eg, After Server Restart They Sometimes Get Stuck In Running Mode, Or

for me, increasing shm-size usually helps. what does this RC fix?

4 years ago

0 We Have A Use Case Where An Experiment Consists Of Multiple Docker Containers. For Example, One Container Works On Cpu Machine, Preprocesses Images And Puts Them Into Queue. The Second One (Main One) Resides On Gpu Machine, Reads Tensors And Targets From

yes, this is the use case, I think we can use smth like Redis for this communication

3 years ago

thanks, this one worked after we changed the package version

4 years ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

on the side note, is there any way to automatically give more meaningful names to the running docker containers?

4 years ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

not necessarily, there are rare cases when container keeps running after experiment is stopped or aborted

will do!

4 years ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

we have a baremetal server with ClearML agents, and sometimes there are hanging containers or containers that consume too much RAM. unless I explicitly add container name in container arguments, it will have a random name, which is not very convenient. it would be great if we could set default container name for each experiment (e.g., experiment id)

4 years ago

Show more results