Reputation
Badges 1
43 × Eureka!how quick is "very quickly"? we are talking about maybe 30 minutes to reach 100 epochs
sorry, in my case it's the default mode
oh right, it will try to use globals from /etc/pip.conf first and then from the virtualenv's pip.conf
the thing is that this runs before you create the virtual environment, so then in the new environment those settings are no longer there
it's also under Other
Just want to know if it would be possible when you have your ClearML server inside your GCP environment, and you want to launch training jobs using Vertex AI. Would the training script be able to register to the server when there is no public IP?I guess it's more related to networking inside GCP, but just wanted to know if anyone tried it.
Awesome! I'll let you know if it works now
I don't understand though..why doesn't this happen on my other experiments?
right, callbacks.py
is a file inside the repo, but is not part of the package
I'm afraid I'm still having the same issue..
I need to wait 100 epochs 😅
don't think so, I'm saving the model at the end of each epoch
I should remark that it's been working OK nonstop for 5 months already.. but yesterday and today I'm experiencing theses crashes
but the reason I said the comparison could be an issue is because I'm not being able to do comparisons of experiments
oh I meant now...so after the reboot everything goes back to "normal"..except that I can't make the comparisons
could it be a memory issue triggered by the comparison of 3 experiments?
or I can make comparisons inside some projects but not others
I can't access the WebAPP nor ssh the server
we are developing a model and I've built a webapp with Streamlit that let's you select the task, and you can see the confusion matrices, splits, data, and predictions on data train/val (all saved in the task), ...and also a model predict function in an image you upload