Reputation
Badges 1
186 × Eureka!python3 slack_alerts.py --channel trains-alerts --slack_api "OUR_KEY" --include_completed_experiments --include_manual_experiments
nope, same problem even after creating a new experiment from scratch
no, I even added the argument to specify tensorboard log_dir to make sure this is not happening
after the very first click, there is a popup with credentials request. nothing happens after that
I'm so happy to see that this problem has been finally solved!
I donโt connect anything explicitly, Iโm using argparse, it used to work before the update
I updated the version in the Installed packages section before starting the experiment
example of the failed experiment
oh wow, I didn't see delete_artifacts_and_models option
I guess we'll have to manually find old artifacts that are related to already deleted tasks
thanks for the link advice, will do
I'll let you know if I managed to achieve my goals with StorageManager
1 - yes, of course =) but it would be awesome if you could customize the content - to include key metrics and hyperparameters, for example
3 - hooooooraaaay
same here, changing arguments in the Args section of Hyperparameters doesnโt work, training script starts with the default values.
trains 0.16.0
trains-agent 0.16.0
trains-server 0.16.0
overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine values I would like to continue from the latest iteration
but for the metrics, I explicitly pass the number of epoch that my training is currently on. it'ls kind of weird that it adds offset to the values that are explicitly reported, no?
thank you, I'll let you know if setting it to zero worked
sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the ass ๐
we often do ablation studies with more than 50 experiments, and it was very convenient to compare their dynamics at the different epochs
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080 http://apiserver:8008WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
`...
it will probably screw up my resource monitoring plots, but well, who cares ๐
still no luck, I tried everything =( any updates?
docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.
awesome news ๐