
DilapidatedParrot58
42
Questions,
205
Answers
Active since 10 January 2023
Last activity
2 years ago
Reputation
0
Badges 1
186 × Eureka!hey guys, I keep getting trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?...
4 years ago
here I am again... can't find how to create a custom queue
4 years ago
hey guys, do you have any tutorials or examples of intergration with dvc?
5 years ago
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
3 years ago
hey guys the first time I'm seeing this behavior I'm adding a new user to /opt/trains/config/apiserver.conf and restarting the containers. all old users are ...
4 years ago
hey everyone, there's a bug that we experience after moving to the new server and domain. if you click on the experiment name while viewing its details, you ...
3 years ago
hey guys, I'm experiencing seemingly random problems with the experiments. there are 4 GPUs and 8 workers (2 workers per GPU) , and sometimes experiments ran...
4 years ago
is there any way to export CSV with max metrics and hyperparameters for selected experiments?
3 years ago
Step 3 Task ( https://github.com/allegroai/trains/blob/master/examples/pipeline/step3_train_model.py ) - Loads the processed data (from Step 2) and clearml a...
4 years ago
feature request: ClearML prints GitHub token in the log, when there is "repository not found" error. it would be nice if could hide it
3 years ago
I'm getting A LOT of errors when running cleanup service Failed deleting the following URIs - script fails to delete image and text files ERROR - Failed dele...
3 years ago
yo guys, I'm getting Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to O...
4 years ago
what is the right way to increase number of retries when using StorageManager.get_local_copy?
2 years ago
I'm using Tensorboard SummaryWriter to add scalar metrics for the experiment. if experiment crashed, and I want to continue it from checkpoint, for some reas...
3 years ago
some random weird feature suggestions for the future 1) it would be great if you could export key experiment data as html or pdf report 2) it would also be q...
4 years ago
we can’t add overview to the subprojects (btw thank you SO MUCH for subprojects, this is probably the best feature ever introduced to trains/clearml). is it ...
3 years ago
it would be nice to group experiments within projects use cases: hyperparameter sweep (10 experiments with different learning rate) finetuning models (for ex...
3 years ago
hey guys, I am trying to plan what I need to do in order to efficiently use ClearML with spot instances 1) detecting when spot instance is down and experimen...
3 years ago
downloading output artifacts from S3 by clicking on the download button next to Model URL was great, but since we moved from AWS to Yandex.Cloud, this featur...
2 years ago
after recent clearml server update, whenever I clone an experiment, the default project for the draft copy is the first project in the list. previously, it w...
2 years ago
hey guys, I'm trying to run an experiment using trains-agent. I have a custom Docker image with nightly versions of pytorch and our own library installed fro...
4 years ago
I'm probably stupid, but how do I specify worker name? usecase - I want to create two workers using the same GPU, and new worker just overwrites the old one
4 years ago
Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:
hey guys, I keep getting "Failed parsing task parameter" warning for the arguments such as this one: parser.add_argument( "--dataset_mean", type = float, nar...
3 years ago
when we train the models, we often choose checkpoint based on the validation accuracy, but test set accuracy (or specific class validation accuracy) is not n...
4 years ago
is there any way to post Slack alerts for the frozen experiments? (eg, after server restart they sometimes get stuck in Running mode, or https://github.com/p...
4 years ago
yo clearml folks! how to force-reinstall package from github in Installed Packages? tried different strategies (using
4 years ago
two annoying visual bugs in ClearML Server UI after latest update: experiment status is still shown as “Aborted” after successful resetting until you refresh...
2 years ago
I’m interested in learning more about internals of ClearML Server - for example, how ElasticSearch, MongoDB, and Redis are used internally. are there any mat...
2 years ago
I keep getting errors when trying to compare a lot of experiments at the same time (>10). what's evern worse is that trains start working much slower in gene...
4 years ago
we just had a slight problem - there was a double space in S3 checkpoint name, but ClearML UI prints them as one in the model description. if you copy and pa...
2 years ago
Show more results
questions