Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
DilapidatedParrot58
Moderator
42 Questions, 205 Answers
  Active since 10 January 2023
  Last activity 2 years ago

Reputation

0

Badges 1

186 × Eureka!
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

oh wow, I didn't see delete_artifacts_and_models option

I guess we'll have to manually find old artifacts that are related to already deleted tasks

3 years ago
3 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

two more questions about cleanup if you don't mind:
what if for some old tasks I get WARNING:root:Could not delete Task ID=a0908784a2a942c3812f947ec1f32c9f, 'Task' object has no attribute 'delete'? What's the best way of cleaning them? What is the recommended way of providing S3 credentials to cleanup task?

3 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

we're using the latest version of clearml, clearml agent and clearml server, but we've been using trains/clearml for 2.5 years, so there are some old tasks left, I guess 😃

3 years ago
0 When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

I guess, this could overcomplicate ui, I don't see a good solution yet.

as a quick hack, we can just use separate name (eg "best_val_roc_auc") for all metric values for the current best checkpoint. then we can just add columns with the last value of this metric

4 years ago
0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

not necessarily, there are rare cases when container keeps running after experiment is stopped or aborted

will do!

3 years ago
0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

that's right, I have 4 GPUs and 4 workers. but what if I want to run two jobs simultaneously at the same GPU

4 years ago
0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

on the side note, is there any way to automatically give more meaningful names to the running docker containers?

3 years ago
0 I’M Interested In Learning More About Internals Of Clearml Server - For Example, How Elasticsearch, Mongodb, And Redis Are Used Internally. Are There Any Materials Available?

I guess I could manually explore different containers and their content 😃 as far as I remember, I had to update Elastic records when we moved to the new cloud provider in order to update model URLs

2 years ago
0 Clearml-Init Doesn'T Ask For Ports, And Our Server Exposes Ports That Are Different From Default Ones. It Would Be Great To Have An Option To Change Default Ports For Api, File And Web Servers, Otherwise Initialization Fails With Wrong Creds Error

sorry, my bad, after some manipulations I made it work. I have to manually change HTTP to HTTPS in config file for Web and Files (not API) server after initialization, but besides that it works

2 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

what if cleanup service is launched using ClearML-Agent Services container (part of the ClearML server)? adding clearml.conf to the home directory doesn't help

3 years ago
0 Yo Guys, I'M Getting

yeah, that's exactly what I'm looking to right now 😃

4 years ago
0 Is Is Possible To Pass Custom

right now we can pass github secrets to the clearml agent training containers ( CLEARML_AGENT_GIT_PASS) to install private repos

we need a way to pass secrets to access our database with annotations

2 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

that was tough but I finally manage to make it working! thanks a lot for your help, I definitely wouldn't be able to do it without you =)

the only problem that I still encounter is that sometimes there are random errors in the beginning of the runs, especiialy when I enqueue multiple experiments at the same time (I have 4 workers for 4 GPUs).
for example, this
from torch.utils.tensorboard import SummaryWrite
writer = SummaryWriter()
sometimes randomly leads to FileNotFoundError: [Errno...

4 years ago
0 It Would Be Nice To Group Experiments Within Projects Use Cases:

that's right
for example, there are tasks A, B, C
we run multiple experiments for A, finetune some of them in separate tasks, then choose one or more best checkpoints, run some experiments for task B, choose the best experiment, and finally run task C

so we get a chain of tasks: A - A-ft - B- C

ClearML pipeline doesn't quite work here because we would like to analyze results of each step before starting next task

but it would be great to see predecessors of each experiment in the chain

3 years ago
0 Hey Guys, Here I Am Again With Another Question

Error 12 : Validation error (value ‘['13b46b9325954517ab99381d5f45237d’, ‘bc76c3a7f0f6431b8e064212e9bdd2c0’, ‘5d2a57cd39b94250b8c8f52303ccef92’, ‘e4731ee5b33e41d992d6d3fdb2913045’, ‘698d9231155e41fbb61f8f3faa605727’, ‘2171b190507f40d1be35e222045c58ea’, ‘55c81a5db0ad40bebf72fdcc1b3be2a4’, ‘94fbdbe26ef242d793e18d955cb3de58’, ‘7d8a6c8f2ae246478b39ae5e87def2ad’, ‘141594c146fe495886d477d9a27c465f’, ‘640f87b02dc94a4098a0aba4d855b8f5’]' length is bigger than allowed maximum ‘10’.)

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

not sure what you mean. I used to do task.set_initial_iteration(task.get_last_iteration()) in the task resuming script, but in the training code I explicitly pass global_step=epoch to the TensorBoard writer

3 years ago
4 years ago
0 Yo Guys, I'M Getting

everything is working as expected

4 years ago
Show more results compactanswers