Reputation
Badges 1
43 × Eureka!don't think so, I'm saving the model at the end of each epoch
I'm plotting the confusion matrices the regular way, plot, then read figure from buffer to create the tensor, and save the tensor
or I can make comparisons inside some projects but not others
oh right, it will try to use globals from /etc/pip.conf first and then from the virtualenv's pip.conf
yes, the code is inside a git repository In the main script: from callbacks import function_plot_conf_matrix
and inside callbacks.py of course at the beginning we have from sklearn.metrics import confusion_matrix or something like that
could it be a memory issue triggered by the comparison of 3 experiments?
I should remark that it's been working OK nonstop for 5 months already.. but yesterday and today I'm experiencing theses crashes
we'll create a minimal working example :-)
ok, so ClearML doesn't add all the imported packages needed to run the task to the Installed Packages, only the ones in the main script?
I can't access the WebAPP nor ssh the server
it's very odd for me too, I have another project running trainings longer that 100 epochs and I don't have this issue
it's also under Other
oh I meant now...so after the reboot everything goes back to "normal"..except that I can't make the comparisons
the issue is that the confusion matrix showing for epoch 101 is in fact the one for epoch 1.
The images are stored in the default files server
Ok, tried the following four things:
(fail = sklearn not listed in installed packages)
no _ init _.py file in the module_a folder, not a git repo: fail no _ init _.py file in module_a folder, git repo: fail with _ init _.py file in module_a folder, not git repo: fail with _ init _.py file in module_a folder, with git repo: OK!
right, callbacks.py is a file inside the repo, but is not part of the package
so what we should do is turn pip freeze on in the clearml.conf file?
I'm creating them for tensorboard yes, and they appear under the debug samples tab
Ok, I think figured it out. We started with a main script that imported sklearn and then we moved that function outside the main script, and instead imported that function.
So when we cloned the first time we had sklearn in the Installed Packages, and therefore our agent was able to run. The (now) cached clearml-venv had sklearn installed, and when it run the second experiment without the sklearn import in the main script and therefore without it in the Installed Packages it didn't matter, b...
