Hi AstonishingRabbit13
now I’m training yolov5 and i want to save all the info (model and metrics ) with clearml to my bucket..
The easiest thing (assuming you are running YOLOv5 with python train.py
is to add the following env variable:CLEARML_DEFAULT_OUTPUT_URI="
" python train.py
Notice that you need to pass your GS credentials here:
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
Hi AgitatedDove14 thanks for the help!
i run it now and in the end the task upload the model for me to the bucketclearml.Task - INFO - Completed model upload to gs://
but when i check i can see in the bucket only the final model.. do you know how can i save all the logs and all the metric images?
Thanks
do you know how can i save all the logs and all the metric images?
These are stored into clearml-server, no? what am I missing ?
yes they are on the clearml-server now
i would like to have it also save on the bucket
- save space on the clearml server
- i have the model + all its info in one place on the bucket
i would like to have it also save on the bucket
oh if this is the casse, you can just change the clearml file server to point to GS bucket, everything will be stored there.
Just change your clearml.conf:files_server: "
"
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L10
Thanks i can see the files now on the bucket
i saw also error in the end of the trainingclearml.storage - ERROR - Failed uploading: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url... (Caused by SSLError(SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac
and in the clearml server some images in the PLOTS tab are missing..
is there something else in the conf that i should change ?
again thanks a lot for the help!
is there something else in the conf that i should change ?
I'm assuming the google credentials?
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
i had that.. in the end the files where uploaded
just some where missing in the clearml server..
AstonishingRabbit13 so is it working now ?
in the bucket i can see all the files now!
but on the clearml server when i go into the train some of the plots are missing..
like Confusion Matrix
..
for now i think i’m ok
the scalars seems to be right and the metrics there is whats import for me..
the error for uploading is weird
again thanks for the help!!
the error for uploading is weird
wait, are you still getting this error?
AstonishingRabbit13
https://github.com/googleapis/google-cloud-python/issues/4941#issuecomment-369472576
check the openssl and the date, this seems like SSL low level error (even before authentication)
(Caused by SSLError(SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac
Where is the code running (agent) GCP instance ? your machine ?
my gcp instance
try to upgrade openssl.. still got error
Could it be you have some custom SSL certificate installed, or policy ?
can you get other https sites? (for example your clearml-server)
its very weird
the train upload to the bucket the files..
ex:clearml.Task - INFO - Completed model upload to gs://... clearml.Task - INFO - Finished uploading
i have print for the model is uploaded
also i can see all the files in the bucket as i said (model+ metrics )
the only thing that missing is some plots on the clearml server (app ) when i got to the details of the train i cannot see the matrix confusion for example ( but its exists on the bucket )
i thought the error logged it might related to that
the only thing that missing is some plots on the clearml server (app ) when i got to the details of the train i cannot see the matrix confusion for example ( but its exists on the bucket )
How do you report the "matrix confusion" ? (I might have an idea on what's the difference)
i’m using the yolov5 repo
https://github.com/ultralytics/yolov5
its should logged all in the end as I understand
its should logged all in the end as I understand
Hmm let me check the code for a minute
The confusion matrix shows under debug sample, but the image is empty, is that correct?
on the clearml server i can see only :F1-Confidence Curve, Precision-Confidence Curve, Precision-Recall Curve, Recall-Confidence Curve,
but on the bucket all the rest:
Are you saying that in the UI you do not see "confusion matrix" at all, only on the GS bucket ?
yes..
when i change the files_server
to be back the clearml server ( save locally ) i can see everything
And you are seeing a bunch of the GS SSL errors?
This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?