Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everybody, I’M Getting Errors With Automatic Model Logging On Pytorch (Running On A Dockered Agent).

Hi everybody,
I’m getting errors with automatic model logging on pytorch (running on a dockered agent).
2022-07-14 10:24:06,334 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_lvacx_4i.tmp => tapsff.local:9000/clearml/combo_model/train/pos_vga_smallBB.718da756157e444d97cf2e1996c82be8/models/model_pos_vga_smallBB.tar 2022-07-14 10:24:06,342 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'> 2022-07-14 10:24:06,343 - clearml.storage - ERROR - Exception encountered while uploading Upload failed 2022-07-14 10:24:06,343 - clearml.Task - INFO - Failed model upload
In my python code, I just call torch.save(checkpoint, path)

Any ideas?

  
  
Posted 2 years ago
Votes Newest

Answers 26


CrookedWalrus33 can you test what happens if you pass the credentials in the global scope as well, i.e. here:
https://github.com/allegroai/clearml/blob/397dcfacda8f133af0acc7d2f9a124dde38ecc4a/docs/clearml.conf#L80

  
  
Posted 2 years ago

I think I found something,
https://github.com/allegroai/clearml/blob/e3547cd89770c6d73f92d9a05696018957c3fd62/clearml/storage/helper.py#L1442
What's the boto version you have installed?

  
  
Posted 2 years ago

CrookedWalrus33 can you post the clearml.conf you have on the agent machine?

  
  
Posted 2 years ago

can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?

If you could also give your clearml packages versions it could help 🙂

  
  
Posted 2 years ago

CrookedWalrus33 I found the issue, this is only failing with Python 3.6.
Let me check something

  
  
Posted 2 years ago

Thanks! Let me check something

  
  
Posted 2 years ago

CrookedWalrus33 any chance you can think of a sample code to reproduce?

  
  
Posted 2 years ago

AgitatedDove14 ,
From the experiment’s console log:
` - boto3==1.16.2

  • botocore==1.19.2 `
  
  
Posted 2 years ago

Thanks ExasperatedCrab78
AgitatedDove14 - attached

  
  
Posted 2 years ago

(with older clearml versions though…).

Yes, we added content type header for the files when uploading to S3 (so it is easier for users to serve them back). But it seems the python 3.5 casting from Path to str breaks it mimetype call....

  
  
Posted 2 years ago

Don't paste your API keys! 🙈

  
  
Posted 2 years ago

SweetBadger76
It’s not a credential issue, because I do upload artifacts manually with tsk.upload_artifact(...)
I’ll try changing the extension, but I have to admit that in the past (I havn’t used clearml for a while and updated it recently to the latest verion) it did got this file extension right

  
  
Posted 2 years ago

AgitatedDove14 its running inside a docker based worker.
Are you interested in the full pip freeze of that docker?

  
  
Posted 2 years ago

can you test what happens if you pass the credentials in the global scope as well, i.e. here:

That didn’t help

  
  
Posted 2 years ago

CrookedWalrus33 I'm testing with the latest RC on a local minio and this is what I'm getting:
clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_3by281j8.tmp => 10.99.0.188:9000/bucket/debug/PyTorch MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Waiting to finish uploads clearml.Task - INFO - Completed model upload to MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Finished uploadingeverything seems to work (w/ boto3==1.16.2 botocore==1.19.2)

Any thoughts?

  
  
Posted 2 years ago

it works locally and not on a remote exec : can you check that the machine that the agent if executed from is correctly configured ? the agent there needs to be provided with the correct credentials the autolog uses the file extension to determine what it is reporting. can you try to use the regular .pt extension ?

  
  
Posted 2 years ago

Oh wow AgitatedDove14 . Appreciate it!
Are you sure it’s just a matter of the python version?
The same experiment script, was working on the exact docker image in the past (with older clearml versions though…).
For example this experiment log:

  
  
Posted 2 years ago

Found the issue, fix in the next RC (soon to be out)

  
  
Posted 2 years ago

CrookedWalrus33 this is odd I tested the exact same code.
I suspect something with the environment maybe?
Whats the python version / OS ? also can you send full pipe freeze?
2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>Yes this is odd, it should add the content-type of the file (for example "application/x-tar" but you are getting None...

  
  
Posted 2 years ago

AgitatedDove14 , did you test it using a worker, or with local execution?
I just tested https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py with a (docker based) worker and it yields the same error
2022-07-17 07:59:40,330 - clearml.Task - INFO - Waiting to finish uploads 2022-07-17 07:59:40,330 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_0_4d_ikk.tmp => tapsff.local:9000/clearml/examples/PyTorch MNIST train.02ed1df11bf546d8bb1938610b352803/models/mnist_cnn.pt 2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'> 2022-07-17 07:59:40,339 - clearml.storage - ERROR - Exception encountered while uploading Upload failed 2022-07-17 07:59:40,339 - clearml.Task - INFO - Failed model upload 2022-07-17 10:59:52 2022-07-17 07:59:49,651 - clearml.Task - INFO - Finished uploading

  
  
Posted 2 years ago

Thanks AgitatedDove14 !
I’ll use clearml 1.4.1 until the fix is out.

  
  
Posted 2 years ago

Did you by any chance save the checkpoint without any file extention? Or with a weird name containing slashes or points? The error seems to suggest the content type was not properly parsed

  
  
Posted 2 years ago

what does that actually mean?
2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>

  
  
Posted 2 years ago

can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?

If you could also give your clearml packages versions it could help

I store the artifacts on a minio server (in my LAN).
If I run the python script locally (i.e. no execute remotely() it works fine).
I use the latest clearml 1.6.2

Did you by any chance save the checkpoint without any file extention? Or with a weird name containing slashes or points? The error seems to suggest the content type was not properly parsed

Checkpoint name is “model_pos_vga_smallBB.tar”

  
  
Posted 2 years ago

Actually if you can send the full log of the Task that would be great

  
  
Posted 2 years ago

Something like:

model = SomePytorchModel()
checkpoint = {'model_state_dict': model.state_dict()}
torch.save(checkpoint, “model.tar”)

  
  
Posted 2 years ago
1K Views
26 Answers
2 years ago
one year ago
Tags