Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
DilapidatedParrot58
Moderator
42 Questions, 205 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

186 × Eureka!
0 Votes
27 Answers
1K Views
0 Votes 27 Answers 1K Views
hey guys, I keep getting trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?...
3 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
yo guys, I'm getting Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to O...
4 years ago
0 Votes
2 Answers
959 Views
0 Votes 2 Answers 959 Views
one year ago
0 Votes
8 Answers
1K Views
0 Votes 8 Answers 1K Views
4 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
what is the right way to increase number of retries when using StorageManager.get_local_copy?
2 years ago
0 Votes
13 Answers
1K Views
0 Votes 13 Answers 1K Views
it would be nice to group experiments within projects use cases: hyperparameter sweep (10 experiments with different learning rate) finetuning models (for ex...
2 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
I’m interested in learning more about internals of ClearML Server - for example, how ElasticSearch, MongoDB, and Redis are used internally. are there any mat...
2 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
here I am again... can't find how to create a custom queue
4 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
4 years ago
0 Votes
9 Answers
1K Views
0 Votes 9 Answers 1K Views
2 years ago
0 Votes
7 Answers
971 Views
0 Votes 7 Answers 971 Views
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
is is possible to pass custom https://clear.ml/docs/latest/docs/configs/env_vars/ to ClearML agents?
2 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
I'm getting A LOT of errors when running cleanup service Failed deleting the following URIs - script fails to delete image and text files ERROR - Failed dele...
2 years ago
0 Votes
14 Answers
1K Views
0 Votes 14 Answers 1K Views
hey guys the first time I'm seeing this behavior I'm adding a new user to /opt/trains/config/apiserver.conf and restarting the containers. all old users are ...
4 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
feature request: ClearML prints GitHub token in the log, when there is "repository not found" error. it would be nice if could hide it
3 years ago
0 Votes
25 Answers
1K Views
0 Votes 25 Answers 1K Views
I'm probably stupid, but how do I specify worker name? usecase - I want to create two workers using the same GPU, and new worker just overwrites the old one
4 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
hey guys, do you have any plans to add functionality to export training config with all hyperparameters to the different formats, such as training command li...
4 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Step 3 Task ( https://github.com/allegroai/trains/blob/master/examples/pipeline/step3_train_model.py ) - Loads the processed data (from Step 2) and clearml a...
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
4 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
3 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
hey guys, is there a ready script that can delete all models from S3 (or other storage) that are related to deleted or archived experiments?
3 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
is there any way to export CSV with max metrics and hyperparameters for selected experiments?
3 years ago
0 Votes
29 Answers
979 Views
0 Votes 29 Answers 979 Views
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
3 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
2 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
we just had a slight problem - there was a double space in S3 checkpoint name, but ClearML UI prints them as one in the model description. if you copy and pa...
2 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
is there any way to post Slack alerts for the frozen experiments? (eg, after server restart they sometimes get stuck in Running mode, or https://github.com/p...
3 years ago
0 Votes
20 Answers
1K Views
0 Votes 20 Answers 1K Views
4 years ago
0 Votes
7 Answers
966 Views
0 Votes 7 Answers 966 Views
there is something weird going on with console log after latest updates of ClearML Server. it doesn't show the latest updates, instead it often jumps to the ...
one year ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
3 years ago
Show more results questions
0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

the weird part is that the old job continues running when I recreate the worker and enqueue the new job

4 years ago
0 Is Is Possible To Pass Custom

works like a charm!

2 years ago
4 years ago
0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

our GPUs are 48GB, so it's quite wasteful to only run one job per GPU
yeah, I'm aware of that, I would have to make sure they don't fail to infamous CUDA out of memory, but still

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

not sure what you mean. I used to do task.set_initial_iteration(task.get_last_iteration()) in the task resuming script, but in the training code I explicitly pass global_step=epoch to the TensorBoard writer

3 years ago
3 years ago
0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

thanks! I need to read all parts of documentation really carefully =) for some reason, couldn't find this section

4 years ago
0 Here I Am Again... Can'T Find How To Create A Custom Queue

LOL
wow 😃
I was trying to find how to create a queue using CLI 😃

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine values I would like to continue from the latest iteration

but for the metrics, I explicitly pass the number of epoch that my training is currently on. it'ls kind of weird that it adds offset to the values that are explicitly reported, no?

3 years ago
0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I've already pulled new images from trains-server, let's see if the initial issue occurs again. thank for the fast response guys!

4 years ago
0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

not necessarily, there are rare cases when container keeps running after experiment is stopped or aborted

will do!

3 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

great, this helped, thanks! I simply added https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html to trains.conf, and it seems to be working

I now have another problem, my code is looking for some additional files in the root folder of the project. I tried adding a Docker layer:
ADD file.pkl /root/.trains/venvs-builds/3.6/task_repository/project.git/extra_data/

but trains probably rewrites the folder when cloning the repo. is there any workaround?

4 years ago
0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I decided to restart the containers one more time, this is what I got.

I had to restart Docker service to remove the containers

4 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

I added the link just in case anyway 😃

also, is there any way to install a repo that we clone as a package. we often use absolute imports and do "pip install -e ." to utilize it
sorry there are so many questions, we just really want to migrate to trains-agent)

4 years ago
0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

on the side note, is there any way to automatically give more meaningful names to the running docker containers?

3 years ago
0 Hey Guys, A Question About Monthly Worker_Stats Indices Each Of Them Takes Up About 1Gb For Us. Do We Really Need To Keep All Of Them? Is There Any Way To Free Up The Space?

yeah, backups take much longer, and we had to increase our EC2 instance volume size twice because of these indices

got it, thanks, will try to delete older ones

4 years ago
Show more results compactanswers