
Reputation
Badges 1
43 × Eureka!Just want to know if it would be possible when you have your ClearML server inside your GCP environment, and you want to launch training jobs using Vertex AI. Would the training script be able to register to the server when there is no public IP?I guess it's more related to networking inside GCP, but just wanted to know if anyone tried it.
I'm gettingValueError: Task object can only be updated if created or in_progress
I'm plotting the confusion matrices the regular way, plot, then read figure from buffer to create the tensor, and save the tensor
I'm afraid I'm still having the same issue..
don't think so, I'm saving the model at the end of each epoch
could it be a memory issue triggered by the comparison of 3 experiments?
Awesome! I'll let you know if it works now
it's very odd for me too, I have another project running trainings longer that 100 epochs and I don't have this issue
I'm creating them for tensorboard yes, and they appear under the debug samples
tab
how quick is "very quickly"? we are talking about maybe 30 minutes to reach 100 epochs
the thing is that this runs before you create the virtual environment, so then in the new environment those settings are no longer there
are you referring to extra_docker_shell_
scrip
t
SuccessfulKoala55 ?
Worked perfectly, thanks!
it's also under Other
I need to wait 100 epochs 😅
and would it be possible to run it using the normal local agent?
sorry, in my case it's the default mode
oh wait, I was using clearml == 0.17.5 and I also had this issue
oh right, it will try to use globals from /etc/pip.conf first and then from the virtualenv's pip.conf
great! thank you for such a quick response!
we are developing a model and I've built a webapp with Streamlit that let's you select the task, and you can see the confusion matrices, splits, data, and predictions on data train/val (all saved in the task), ...and also a model predict function in an image you upload