I donโt connect anything explicitly, Iโm using argparse, it used to work before the update
I updated the version in the Installed packages section before starting the experiment
example of the failed experiment
oh wow, I didn't see delete_artifacts_and_models option
I guess we'll have to manually find old artifacts that are related to already deleted tasks
thanks for the link advice, will do
I'll let you know if I managed to achieve my goals with StorageManager
1 - yes, of course =) but it would be awesome if you could customize the content - to include key metrics and hyperparameters, for example
3 - hooooooraaaay
same here, changing arguments in the Args section of Hyperparameters doesnโt work, training script starts with the default values.
trains 0.16.0
trains-agent 0.16.0
trains-server 0.16.0
overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine values I would like to continue from the latest iteration
but for the metrics, I explicitly pass the number of epoch that my training is currently on. it'ls kind of weird that it adds offset to the values that are explicitly reported, no?
thank you, I'll let you know if setting it to zero worked
sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the ass ๐
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080 http://apiserver:8008WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
`...
it will probably screw up my resource monitoring plots, but well, who cares ๐
still no luck, I tried everything =( any updates?
docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.
awesome news ๐
wow, thanks, just updated our server!
can't seem to find these metrics snapshot plots =) how do I plot one?
yeah, I am aware of trains-agent, we are planning to start using it soon, but still, copying original training command would be useful
another stupid question - what is the proper way to delete a worker? so far I've been using pgrep to find the relevant PID ๐