
Reputation
Badges 1
43 × Eureka!Ok, I think figured it out. We started with a main script that imported sklearn and then we moved that function outside the main script, and instead imported that function.
So when we cloned the first time we had sklearn in the Installed Packages, and therefore our agent was able to run. The (now) cached clearml-venv had sklearn installed, and when it run the second experiment without the sklearn import in the main script and therefore without it in the Installed Packages it didn't matter, b...
ok, so ClearML doesn't add all the imported packages needed to run the task to the Installed Packages, only the ones in the main script?
are you referring to extra_docker_shell_
scrip
t
SuccessfulKoala55 ?
Worked perfectly, thanks!
Awesome! I'll let you know if it works now
yes, the code is inside a git repository In the main script: from callbacks import function_plot_conf_matrix
and inside callbacks.py
of course at the beginning we have from sklearn.metrics import confusion_matrix
or something like that
the thing is that this runs before you create the virtual environment, so then in the new environment those settings are no longer there
I'm plotting the confusion matrices the regular way, plot, then read figure from buffer to create the tensor, and save the tensor
I see, I can confirm that these packages (except for google_cloud_storage) are imported directly in the main script
or I can make comparisons inside some projects but not others
Just want to know if it would be possible when you have your ClearML server inside your GCP environment, and you want to launch training jobs using Vertex AI. Would the training script be able to register to the server when there is no public IP?I guess it's more related to networking inside GCP, but just wanted to know if anyone tried it.
could it be a memory issue triggered by the comparison of 3 experiments?
we are developing a model and I've built a webapp with Streamlit that let's you select the task, and you can see the confusion matrices, splits, data, and predictions on data train/val (all saved in the task), ...and also a model predict function in an image you upload
I should remark that it's been working OK nonstop for 5 months already.. but yesterday and today I'm experiencing theses crashes
oh right, it will try to use globals from /etc/pip.conf first and then from the virtualenv's pip.conf
sorry, in my case it's the default mode
and would it be possible to run it using the normal local agent?
oh I meant now...so after the reboot everything goes back to "normal"..except that I can't make the comparisons
but the reason I said the comparison could be an issue is because I'm not being able to do comparisons of experiments
I can't access the WebAPP nor ssh the server