hey guys, thanks for creating Slack workspace, that's really cool. question - are we missing smth or is currently not possible to pass S3 credentials via env...
4 years ago
Step 3 Task ( https://github.com/allegroai/trains/blob/master/examples/pipeline/step3_train_model.py ) - Loads the processed data (from Step 2) and clearml a...
3 years ago
hey guys, do you have any tutorials or examples of intergration with dvc?
4 years ago
we have a use case where an experiment consists of multiple docker containers. for example, one container works on CPU machine, preprocesses images and puts ...
2 years ago
is is possible to pass custom https://clear.ml/docs/latest/docs/configs/env_vars/ to ClearML agents?
2 years ago
I'm getting A LOT of errors when running cleanup service Failed deleting the following URIs - script fails to delete image and text files ERROR - Failed dele...
2 years ago
clearml-init doesn't ask for ports, and our server exposes ports that are different from default ones. it would be great to have an option to change default ...
2 years ago
what is the right way to increase number of retries when using StorageManager.get_local_copy?
2 years ago
I’m interested in learning more about internals of ClearML Server - for example, how ElasticSearch, MongoDB, and Redis are used internally. are there any mat...
2 years ago
I'm using Tensorboard SummaryWriter to add scalar metrics for the experiment. if experiment crashed, and I want to continue it from checkpoint, for some reas...
3 years ago
when we train the models, we often choose checkpoint based on the validation accuracy, but test set accuracy (or specific class validation accuracy) is not n...
3 years ago
after recent clearml server update, whenever I clone an experiment, the default project for the draft copy is the first project in the list. previously, it w...
one year ago
there is something weird going on with console log after latest updates of ClearML Server. it doesn't show the latest updates, instead it often jumps to the ...
one year ago
is there any way to post Slack alerts for the frozen experiments? (eg, after server restart they sometimes get stuck in Running mode, or https://github.com/p...
3 years ago
we just had a slight problem - there was a double space in S3 checkpoint name, but ClearML UI prints them as one in the model description. if you copy and pa...
2 years ago
here I am again... can't find how to create a custom queue
4 years ago
we can’t add overview to the subprojects (btw thank you SO MUCH for subprojects, this is probably the best feature ever introduced to trains/clearml). is it ...
3 years ago
hey guys, I'm experiencing seemingly random problems with the experiments. there are 4 GPUs and 8 workers (2 workers per GPU) , and sometimes experiments ran...
4 years ago
I keep getting errors when trying to compare a lot of experiments at the same time (>10). what's evern worse is that trains start working much slower in gene...
4 years ago
hey guys, do you have any plans to add functionality to export training config with all hyperparameters to the different formats, such as training command li...
4 years ago
yo guys, I'm getting Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to O...
4 years ago
hey guys, I keep getting trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?...
3 years ago
I'm probably stupid, but how do I specify worker name? usecase - I want to create two workers using the same GPU, and new worker just overwrites the old one
4 years ago
two annoying visual bugs in ClearML Server UI after latest update: experiment status is still shown as “Aborted” after successful resetting until you refresh...
2 years ago
hey guys the first time I'm seeing this behavior I'm adding a new user to /opt/trains/config/apiserver.conf and restarting the containers. all old users are ...
4 years ago
feature request: we have several servers with multiple GPUs, and atm we have to manually check which GPU has enough memory before queuing each experiment int...
3 years ago
hey guys, is there a ready script that can delete all models from S3 (or other storage) that are related to deleted or archived experiments?
3 years ago
hey guys, here I am again with another question 😃 after the latest update, I’m getting this error when I’m trying to compare scalars for more than 10 experi...
4 years ago
hey guys, a question about monthly worker_stats indices each of them takes up about 1gb for us. do we really need to keep all of them? is there any way to fr...
4 years ago
I updated trains-server today, and now it's very unstable, Web interface randomly stops working. anyone had the same problem? I've never had any problems wit...
4 years ago