Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
DilapidatedParrot58
Moderator
42 Questions, 205 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

186 × Eureka!
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
hey guys, do you have any plans to add functionality to export training config with all hyperparameters to the different formats, such as training command li...
4 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
yo guys, I'm getting Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to O...
4 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
3 years ago
0 Votes
27 Answers
1K Views
0 Votes 27 Answers 1K Views
hey guys, I keep getting trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?...
3 years ago
0 Votes
25 Answers
1K Views
0 Votes 25 Answers 1K Views
I'm probably stupid, but how do I specify worker name? usecase - I want to create two workers using the same GPU, and new worker just overwrites the old one
4 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
4 years ago
0 Votes
20 Answers
1K Views
0 Votes 20 Answers 1K Views
4 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
is there any way to export CSV with max metrics and hyperparameters for selected experiments?
3 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
feature request: ClearML prints GitHub token in the log, when there is "repository not found" error. it would be nice if could hide it
3 years ago
0 Votes
13 Answers
1K Views
0 Votes 13 Answers 1K Views
it would be nice to group experiments within projects use cases: hyperparameter sweep (10 experiments with different learning rate) finetuning models (for ex...
2 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
4 years ago
Show more results questions
0 Hi

all our workers went down after starting the slack bot, is it expected?)

4 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

I added the link just in case anyway ๐Ÿ˜ƒ

also, is there any way to install a repo that we clone as a package. we often use absolute imports and do "pip install -e ." to utilize it
sorry there are so many questions, we just really want to migrate to trains-agent)

4 years ago
0 Yo Guys, I'M Getting

I get "The connection has timed out" when I'm trying to reach 8081 port

4 years ago
0 We Just Had A Slight Problem - There Was A Double Space In S3 Checkpoint Name, But Clearml Ui Prints Them As One In The Model Description. If You Copy And Paste It, The Address Will Be Wrong

original task name contains double space -> saved checkpoint also contains double space -> MODEL URL field in model description of this checkpoint in ClearML converts double space into single space. so when you copy & paste it somewhere, it'll be incorrect

2 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

what if cleanup service is launched using ClearML-Agent Services container (part of the ClearML server)? adding clearml.conf to the home directory doesn't help

3 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

oh wow, I didn't see delete_artifacts_and_models option

I guess we'll have to manually find old artifacts that are related to already deleted tasks

3 years ago
0 We Just Had A Slight Problem - There Was A Double Space In S3 Checkpoint Name, But Clearml Ui Prints Them As One In The Model Description. If You Copy And Paste It, The Address Will Be Wrong

thanks! we copy S3 URLs quite often. I know that itโ€™s better to avoid double spaces in task names, but shit happens ๐Ÿ˜„

2 years ago
0 Hey Guys, Is There A Ready Script That Can Delete All Models From S3 (Or Other Storage) That Are Related To Deleted Or Archived Experiments?

two more questions about cleanup if you don't mind:
what if for some old tasks I get WARNING:root:Could not delete Task ID=a0908784a2a942c3812f947ec1f32c9f, 'Task' object has no attribute 'delete'? What's the best way of cleaning them? What is the recommended way of providing S3 credentials to cleanup task?

3 years ago
3 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

this is the artifactory, this is how I install these packages in the Docker image:
pip3 install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html

the files are used for training and evaluation (e.g., precomputed pycocotools meta-info). I could theoretically include them in the repo, but some of them might be quite heavy. what do you mean when you say that they get lost? I copy them from the host machine when I build the custom image, so they are i...

4 years ago
4 years ago
0 Here I Am Again... Can'T Find How To Create A Custom Queue

LOL
wow ๐Ÿ˜ƒ
I was trying to find how to create a queue using CLI ๐Ÿ˜ƒ

4 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

great, this helped, thanks! I simply added https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html to trains.conf, and it seems to be working

I now have another problem, my code is looking for some additional files in the root folder of the project. I tried adding a Docker layer:
ADD file.pkl /root/.trains/venvs-builds/3.6/task_repository/project.git/extra_data/

but trains probably rewrites the folder when cloning the repo. is there any workaround?

4 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

it also happens sometimes during the run when tensorboard is trying to write smth to the disk and there are multiple experiments running. so it must be smth similar to the scenario you're describing, but I have no idea how it can happen since I'm running four separate workers

4 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

that was tough but I finally manage to make it working! thanks a lot for your help, I definitely wouldn't be able to do it without you =)

the only problem that I still encounter is that sometimes there are random errors in the beginning of the runs, especiialy when I enqueue multiple experiments at the same time (I have 4 workers for 4 GPUs).
for example, this
from torch.utils.tensorboard import SummaryWrite
writer = SummaryWriter()
sometimes randomly leads to FileNotFoundError: [Errno...

4 years ago
Show more results compactanswers