My change only adding
output_uri to use GCS path
I have to admit this is very strange, I think the fact it works for the artifacts and not for the model is kind of a fluke ...
If you use "wait_on_upload" argument in the upload_artifact you end up with the same behavior. Even if uploaded in the background, the issue is still there, for me it was revealed the minute I limited the upload bandwidth to under 300kbps.It seems the internal GS timeout assumes every chunk should be uploaded in under 60 seconds.
The default chunk size is 100MB (I think), and anything below it is a single stream upload.
I'm not sure what's the right route to take here, should we externally configure GS package ? It seems like the GS package internal issue, and I'm not sure it is our place to fix it.
ClearML will only log the exact local place where you stored the file, I assume this is it.
If you pass
output_uri=True to the
Task.init it will automatically upload the model to the files_server and then the model repository will point to the files_server (you can also have any object storage as model storage, e.g. output_uri=
Notice you can also set it as default configuration (local or on the agent):
The next question is about upload the model artifact using cloud storage.
I’m trying to use Google Cloud Storage to store my model checkpoint, however failed with following errors:
2021-05-12 18:51:53,335 - clearml.storage - ERROR - Failed uploading: ('Connection aborted.', timeout('The write operation timed out')) 2021-05-12 18:51:53,335 - clearml.Task - INFO - Completed model upload to
2021-05-12 18:51:54,298 - clearml.Task - INFO - Finished uploadingit said the uploading process got timeout, but the next one said the uploading process is complete.
After checking the bucket, i found nothing (means the model is not yet uploaded).
Any idea about the timeout reason AgitatedDove14 ? I believe i already use the correct credentials and tested it manually using StorageManager SDK
Maybe that's the issue :
Hi MortifiedCrow63 , thank you for pinging! (seriously greatly appreciated!)
Can you test with the latest release, see if the issue was fixed?
I do not think this is the upload timeout, it makes no sense to me for GCP package (we do not pass any timeout, it's their internal default for the argument) to include a 60sec timeout for upload...
I'm also not sure where is the origin of the timeout (I'm assuming the initial GCP handshake connection could not actually timeout, as the response should be relatively quick, so 60sec is more than enough)
noted AgitatedDove14 ,
just wondering why the behavior between auto logging and manual upload (using
StorageManager ) can yield different results. Do you think we’re using different component here?
If the problem is coming from the GCS, the
StorageManager should also fail, right?