Reputation
Badges 1
43 × Eureka!sorry, in my case it's the default mode
oh right, it will try to use globals from /etc/pip.conf first and then from the virtualenv's pip.conf
I should remark that it's been working OK nonstop for 5 months already.. but yesterday and today I'm experiencing theses crashes
Awesome! I'll let you know if it works now
oh wait, I was using clearml == 0.17.5 and I also had this issue
but the reason I said the comparison could be an issue is because I'm not being able to do comparisons of experiments
so what we should do is turn pip freeze on in the clearml.conf
file?
we'll create a minimal working example :-)
we are developing a model and I've built a webapp with Streamlit that let's you select the task, and you can see the confusion matrices, splits, data, and predictions on data train/val (all saved in the task), ...and also a model predict function in an image you upload
the issue is that the confusion matrix showing for epoch 101 is in fact the one for epoch 1.
The images are stored in the default files server
I see the correct confusion matrices in tensorboard
Just want to know if it would be possible when you have your ClearML server inside your GCP environment, and you want to launch training jobs using Vertex AI. Would the training script be able to register to the server when there is no public IP?I guess it's more related to networking inside GCP, but just wanted to know if anyone tried it.
So what changed?
We changed other bits of code, but not that one..
But maybe we are focusing on the wrong thing, the question now is why is ClearML only detecting these packages (running a different experiment than Diego)
Pillow == 8.0.1
clearml == 0.17.5
google_cloud_storage == 1.40.0
joblib == 0.17.0
numpy == 1.19.5
pandas == 1.3.1
seaborn == 0.11.0
tensorflow_gpu == 2.3.1
tqdm == 4.54.1
ok, so ClearML doesn't add all the imported packages needed to run the task to the Installed Packages, only the ones in the main script?
oh I meant now...so after the reboot everything goes back to "normal"..except that I can't make the comparisons
I'm gettingValueError: Task object can only be updated if created or in_progress
it's also under Other
Ok, I think figured it out. We started with a main script that imported sklearn and then we moved that function outside the main script, and instead imported that function.
So when we cloned the first time we had sklearn in the Installed Packages, and therefore our agent was able to run. The (now) cached clearml-venv had sklearn installed, and when it run the second experiment without the sklearn import in the main script and therefore without it in the Installed Packages it didn't matter, b...
it's very odd for me too, I have another project running trainings longer that 100 epochs and I don't have this issue
I need to wait 100 epochs 😅