
DilapidatedParrot58
42
Questions,
205
Answers
Active since 10 January 2023
Last activity
2 years ago
Reputation
0
Badges 1
186 × Eureka!we just had a slight problem - there was a double space in S3 checkpoint name, but ClearML UI prints them as one in the model description. if you copy and pa...
2 years ago
there is something weird going on with console log after latest updates of ClearML Server. it doesn't show the latest updates, instead it often jumps to the ...
2 years ago
hey guys, do you have any tutorials or examples of intergration with dvc?
5 years ago
we can’t add overview to the subprojects (btw thank you SO MUCH for subprojects, this is probably the best feature ever introduced to trains/clearml). is it ...
3 years ago
I'm getting A LOT of errors when running cleanup service Failed deleting the following URIs - script fails to delete image and text files ERROR - Failed dele...
3 years ago
hey guys, here I am again with another question 😃 after the latest update, I’m getting this error when I’m trying to compare scalars for more than 10 experi...
4 years ago
hey guys, I'm experiencing seemingly random problems with the experiments. there are 4 GPUs and 8 workers (2 workers per GPU) , and sometimes experiments ran...
4 years ago
when we train the models, we often choose checkpoint based on the validation accuracy, but test set accuracy (or specific class validation accuracy) is not n...
4 years ago
I’m interested in learning more about internals of ClearML Server - for example, how ElasticSearch, MongoDB, and Redis are used internally. are there any mat...
2 years ago
hey guys, do you have any plans to add functionality to export training config with all hyperparameters to the different formats, such as training command li...
5 years ago
is there any way to post Slack alerts for the frozen experiments? (eg, after server restart they sometimes get stuck in Running mode, or https://github.com/p...
4 years ago
we have a use case where an experiment consists of multiple docker containers. for example, one container works on CPU machine, preprocesses images and puts ...
2 years ago
I'm probably stupid, but how do I specify worker name? usecase - I want to create two workers using the same GPU, and new worker just overwrites the old one
4 years ago
clearml-init doesn't ask for ports, and our server exposes ports that are different from default ones. it would be great to have an option to change default ...
2 years ago
hey guys the first time I'm seeing this behavior I'm adding a new user to /opt/trains/config/apiserver.conf and restarting the containers. all old users are ...
4 years ago
is is possible to pass custom https://clear.ml/docs/latest/docs/configs/env_vars/ to ClearML agents?
2 years ago
hey guys, I am trying to plan what I need to do in order to efficiently use ClearML with spot instances 1) detecting when spot instance is down and experimen...
3 years ago
hey guys, I'm trying to run an experiment using trains-agent. I have a custom Docker image with nightly versions of pytorch and our own library installed fro...
4 years ago
it would be nice to group experiments within projects use cases: hyperparameter sweep (10 experiments with different learning rate) finetuning models (for ex...
3 years ago
after recent clearml server update, whenever I clone an experiment, the default project for the draft copy is the first project in the list. previously, it w...
2 years ago
here I am again... can't find how to create a custom queue
4 years ago
I'm using Tensorboard SummaryWriter to add scalar metrics for the experiment. if experiment crashed, and I want to continue it from checkpoint, for some reas...
3 years ago
what is the right way to increase number of retries when using StorageManager.get_local_copy?
2 years ago
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
3 years ago
hey guys, a question about monthly worker_stats indices each of them takes up about 1gb for us. do we really need to keep all of them? is there any way to fr...
4 years ago
yo clearml folks! how to force-reinstall package from github in Installed Packages? tried different strategies (using
4 years ago
hey guys, I keep getting trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?...
4 years ago
yo guys, I'm getting Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to O...
4 years ago
Step 3 Task ( https://github.com/allegroai/trains/blob/master/examples/pipeline/step3_train_model.py ) - Loads the processed data (from Step 2) and clearml a...
4 years ago
hey guys, is there a ready script that can delete all models from S3 (or other storage) that are related to deleted or archived experiments?
3 years ago
Show more results
questions