Reputation
Badges 1
186 × Eureka!I'm so happy to see that this problem has been finally solved!
I donβt connect anything explicitly, Iβm using argparse, it used to work before the update
I updated the version in the Installed packages section before starting the experiment
example of the failed experiment
oh wow, I didn't see delete_artifacts_and_models option
I guess we'll have to manually find old artifacts that are related to already deleted tasks
thanks for the link advice, will do
I'll let you know if I managed to achieve my goals with StorageManager
1 - yes, of course =) but it would be awesome if you could customize the content - to include key metrics and hyperparameters, for example
3 - hooooooraaaay
same here, changing arguments in the Args section of Hyperparameters doesnβt work, training script starts with the default values.
trains 0.16.0
trains-agent 0.16.0
trains-server 0.16.0
overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine values I would like to continue from the latest iteration
but for the metrics, I explicitly pass the number of epoch that my training is currently on. it'ls kind of weird that it adds offset to the values that are explicitly reported, no?
thank you, I'll let you know if setting it to zero worked
sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the ass π
we often do ablation studies with more than 50 experiments, and it was very convenient to compare their dynamics at the different epochs
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080 http://apiserver:8008WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
`...
it will probably screw up my resource monitoring plots, but well, who cares π
still no luck, I tried everything =( any updates?
docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.
awesome news π
wow, thanks, just updated our server!
can't seem to find these metrics snapshot plots =) how do I plot one?
Error 12 : Validation error (value β['13b46b9325954517ab99381d5f45237dβ, βbc76c3a7f0f6431b8e064212e9bdd2c0β, β5d2a57cd39b94250b8c8f52303ccef92β, βe4731ee5b33e41d992d6d3fdb2913045β, β698d9231155e41fbb61f8f3faa605727β, β2171b190507f40d1be35e222045c58eaβ, β55c81a5db0ad40bebf72fdcc1b3be2a4β, β94fbdbe26ef242d793e18d955cb3de58β, β7d8a6c8f2ae246478b39ae5e87def2adβ, β141594c146fe495886d477d9a27c465fβ, β640f87b02dc94a4098a0aba4d855b8f5β]' length is bigger than allowed maximum β10β.)
yeah, I am aware of trains-agent, we are planning to start using it soon, but still, copying original training command would be useful