Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Everyone, I’M Newcomer For Clearml. I Have Question Related To

Hello Everyone, i’m newcomer for ClearML. I have question related to Model URL information.

Where exactly the model artifacts were stored ya? assuming that i’m using default configuration

I saw file:///var/folders/cj/SOME_RANDOM_ID/T/tf_ckpts/ckpt-1 , but not sure where is the actual folder location in the server.

Thanks!

  
  
Posted 3 years ago
Votes Newest

Answers 25


Hi MortifiedCrow63
I have to admit this is very strange, I think the fact it works for the artifacts and not for the model is kind of a fluke ...
If you use "wait_on_upload" argument in the upload_artifact you end up with the same behavior. Even if uploaded in the background, the issue is still there, for me it was revealed the minute I limited the upload bandwidth to under 300kbps.It seems the internal GS timeout assumes every chunk should be uploaded in under 60 seconds.
The default chunk size is 100MB (I think), and anything below it is a single stream upload.
I'm not sure what's the right route to take here, should we externally configure GS package ? It seems like the GS package internal issue, and I'm not sure it is our place to fix it.
wdyt ?

BTW:
https://github.com/googleapis/python-storage/issues/263
https://github.com/googleapis/python-storage/issues/183

  
  
Posted 3 years ago

Just curious about the timeout, was it configured by clearML or the GCS? Can we customize the timeout?

I'm assuming this is GCS, at the end the actual upload is done GCS python package.
Maybe there is an env variable ... Let me google it

  
  
Posted 3 years ago

Hi MortifiedCrow63
I finally got GS credentials, there is something weird going on. I can verify the issue, with model upload I get timeout error while upload_artifacts just works.
Just updating here that we are looking into it.

  
  
Posted 3 years ago

Hi AgitatedDove14 , any update on the bug of GCS timeout?

  
  
Posted 3 years ago

Noted AgitatedDove14 , so likely it’s about bandwidth issue. Let me try suggestion from the github first. Thanks man!

  
  
Posted 3 years ago

No worries AgitatedDove14 , thanks for helping me.

Just curious about the timeout, was it configured by clearML or the GCS? Can we customize the timeout?

  
  
Posted 3 years ago

Thanks AgitatedDove14 ,

I think so, Can we configure the timeout from ClearML interface?

(I’m assuming the upload could take longer).

  
  
Posted 3 years ago

Hi MortifiedCrow63

saw 

file:///var/folders/cj/SOME_RANDOM_ID/T/tf_ckpts/ckpt-1

 , ...

By default ClearML will only log the exact local place where you stored the file, I assume this is it.
If you pass output_uri=True to the Task.init it will automatically upload the model to the files_server and then the model repository will point to the files_server (you can also have any object storage as model storage, e.g. output_uri=s3://bucket )
Notice you can also set it as default configuration (local or on the agent):
https://github.com/allegroai/clearml/blob/f46561629f1a7d4a05c7ae135de98db99439c989/docs/clearml.conf#L156

  
  
Posted 3 years ago

Internally we use blob.upload_from_file it has a default 60sec timeout on the connection (I'm assuming the upload could take longer).

  
  
Posted 3 years ago

Could you test with the same file? Maybe timeout has something to do with the file size ?

  
  
Posted 3 years ago

Hi MortifiedCrow63
Sorry getting GS credentials is taking longer than expected 🙂
Nonetheless it should not be an issue (model upload is essentially using the same StorageManager internally)

  
  
Posted 3 years ago

noted AgitatedDove14 ,

just wondering why the behavior between auto logging and manual upload (using StorageManager ) can yield different results. Do you think we’re using different component here?

If the problem is coming from the GCS, the StorageManager should also fail, right?

  
  
Posted 3 years ago

AgitatedDove14 already done that and it works, my tested command: manager.upload_file

ClearML version: 1.0.2
ClearML Server version: 1.0.0-93

  
  
Posted 3 years ago

I do not think this is the upload timeout, it makes no sense to me for GCP package (we do not pass any timeout, it's their internal default for the argument) to include a 60sec timeout for upload...
I'm also not sure where is the origin of the timeout (I'm assuming the initial GCP handshake connection could not actually timeout, as the response should be relatively quick, so 60sec is more than enough)

  
  
Posted 3 years ago

MortifiedCrow63 , hmmm can you test with manual upload and verify ?
(also what's the clearml version you are using)

  
  
Posted 3 years ago

Hi MortifiedCrow63 , thank you for pinging! (seriously greatly appreciated!)
See here:
https://github.com/googleapis/python-storage/releases/tag/v1.36.0
https://github.com/googleapis/python-storage/pull/374
Can you test with the latest release, see if the issue was fixed?
https://github.com/googleapis/python-storage/releases/tag/v1.41.0

  
  
Posted 3 years ago

That’s the question i want to raise too,

No file size limit
Let me try to run it myself

  
  
Posted 3 years ago

Thanks AgitatedDove14 , i missed that one.

  
  
Posted 3 years ago

Thanks for confirming AgitatedDove14 , any github issue that i can follow?

  
  
Posted 3 years ago

This looks exactly like the timeout you are getting.
I'm just not sure what's the diff between the Model autoupload and the manual upload.

  
  
Posted 3 years ago

That’s the question i want to raise too, is there any limit on the file size? the size actually ~32 Mb, just using your MNIST example

Can we raise the size limit?

  
  
Posted 3 years ago

The next question is about upload the model artifact using cloud storage.

I’m trying to use Google Cloud Storage to store my model checkpoint, however failed with following errors:

2021-05-12 18:51:53,335 - clearml.storage - ERROR - Failed uploading: ('Connection aborted.', timeout('The write operation timed out')) 2021-05-12 18:51:53,335 - clearml.Task - INFO - Completed model upload to 2021-05-12 18:51:54,298 - clearml.Task - INFO - Finished uploadingit said the uploading process got timeout, but the next one said the uploading process is complete.

After checking the bucket, i found nothing (means the model is not yet uploaded).

Any idea about the timeout reason AgitatedDove14 ? I believe i already use the correct credentials and tested it manually using StorageManager SDK

Thanks

  
  
Posted 3 years ago
923 Views
25 Answers
3 years ago
one year ago
Tags