Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
DilapidatedParrot58
Moderator
42 Questions, 205 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

186 × Eureka!
0 We Just Had A Slight Problem - There Was A Double Space In S3 Checkpoint Name, But Clearml Ui Prints Them As One In The Model Description. If You Copy And Paste It, The Address Will Be Wrong

original task name contains double space -> saved checkpoint also contains double space -> MODEL URL field in model description of this checkpoint in ClearML converts double space into single space. so when you copy & paste it somewhere, it'll be incorrect

2 years ago
0 Hey Guys, Here I Am Again With Another Question

we often do ablation studies with more than 50 experiments, and it was very convenient to compare their dynamics at the different epochs

4 years ago
3 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

what if cleanup service is launched using ClearML-Agent Services container (part of the ClearML server)? adding clearml.conf to the home directory doesn't help

3 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

two more questions about cleanup if you don't mind:
what if for some old tasks I get WARNING:root:Could not delete Task ID=a0908784a2a942c3812f947ec1f32c9f, 'Task' object has no attribute 'delete'? What's the best way of cleaning them? What is the recommended way of providing S3 credentials to cleanup task?

3 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

oh wow, I didn't see delete_artifacts_and_models option

I guess we'll have to manually find old artifacts that are related to already deleted tasks

3 years ago
0 Any Chance Storagemanager Could Re-Download Files Only If Their Size Is Different From File In Cache (As An Option)?

yeah, I was thinking mainly about AWS. we use force to make sure we are using the correct latest checkpoint, but this increases costs when we are running a lot of experiments

3 years ago
0 Is Is Possible To Pass Custom

works like a charm!

2 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

not sure what you mean. I used to do task.set_initial_iteration(task.get_last_iteration()) in the task resuming script, but in the training code I explicitly pass global_step=epoch to the TensorBoard writer

3 years ago
3 years ago
0 Here I Am Again... Can'T Find How To Create A Custom Queue

LOL
wow 😃
I was trying to find how to create a queue using CLI 😃

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine values I would like to continue from the latest iteration

but for the metrics, I explicitly pass the number of epoch that my training is currently on. it'ls kind of weird that it adds offset to the values that are explicitly reported, no?

3 years ago
0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I've already pulled new images from trains-server, let's see if the initial issue occurs again. thank for the fast response guys!

4 years ago
0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

not necessarily, there are rare cases when container keeps running after experiment is stopped or aborted

will do!

3 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

great, this helped, thanks! I simply added https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html to trains.conf, and it seems to be working

I now have another problem, my code is looking for some additional files in the root folder of the project. I tried adding a Docker layer:
ADD file.pkl /root/.trains/venvs-builds/3.6/task_repository/project.git/extra_data/

but trains probably rewrites the folder when cloning the repo. is there any workaround?

4 years ago
0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I decided to restart the containers one more time, this is what I got.

I had to restart Docker service to remove the containers

4 years ago
Show more results compactanswers