Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everybody, I’M Getting Errors With Automatic Model Logging On Pytorch (Running On A Dockered Agent).

Hi everybody,
I’m getting errors with automatic model logging on pytorch (running on a dockered agent).
2022-07-14 10:24:06,334 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_lvacx_4i.tmp => tapsff.local:9000/clearml/combo_model/train/pos_vga_smallBB.718da756157e444d97cf2e1996c82be8/models/model_pos_vga_smallBB.tar 2022-07-14 10:24:06,342 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'> 2022-07-14 10:24:06,343 - clearml.storage - ERROR - Exception encountered while uploading Upload failed 2022-07-14 10:24:06,343 - clearml.Task - INFO - Failed model upload
In my python code, I just call torch.save(checkpoint, path)

Any ideas?

  
  
Posted one year ago
Votes Newest

Answers 26


can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?

If you could also give your clearml packages versions it could help

I store the artifacts on a minio server (in my LAN).
If I run the python script locally (i.e. no execute remotely() it works fine).
I use the latest clearml 1.6.2

Did you by any chance save the checkpoint without any file extention? Or with a weird name containing slashes or points? The error seems to suggest the content type was not properly parsed

Checkpoint name is “model_pos_vga_smallBB.tar”

  
  
Posted one year ago

it works locally and not on a remote exec : can you check that the machine that the agent if executed from is correctly configured ? the agent there needs to be provided with the correct credentials the autolog uses the file extension to determine what it is reporting. can you try to use the regular .pt extension ?

  
  
Posted one year ago

SweetBadger76
It’s not a credential issue, because I do upload artifacts manually with tsk.upload_artifact(...)
I’ll try changing the extension, but I have to admit that in the past (I havn’t used clearml for a while and updated it recently to the latest verion) it did got this file extension right

  
  
Posted one year ago

CrookedWalrus33 can you post the clearml.conf you have on the agent machine?

  
  
Posted one year ago

can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?

If you could also give your clearml packages versions it could help 🙂

  
  
Posted one year ago

Did you by any chance save the checkpoint without any file extention? Or with a weird name containing slashes or points? The error seems to suggest the content type was not properly parsed

  
  
Posted one year ago

Don't paste your API keys! 🙈

  
  
Posted one year ago

AgitatedDove14 , did you test it using a worker, or with local execution?
I just tested https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py with a (docker based) worker and it yields the same error
2022-07-17 07:59:40,330 - clearml.Task - INFO - Waiting to finish uploads 2022-07-17 07:59:40,330 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_0_4d_ikk.tmp => tapsff.local:9000/clearml/examples/PyTorch MNIST train.02ed1df11bf546d8bb1938610b352803/models/mnist_cnn.pt 2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'> 2022-07-17 07:59:40,339 - clearml.storage - ERROR - Exception encountered while uploading Upload failed 2022-07-17 07:59:40,339 - clearml.Task - INFO - Failed model upload 2022-07-17 10:59:52 2022-07-17 07:59:49,651 - clearml.Task - INFO - Finished uploading

  
  
Posted one year ago

CrookedWalrus33 I'm testing with the latest RC on a local minio and this is what I'm getting:
clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_3by281j8.tmp => 10.99.0.188:9000/bucket/debug/PyTorch MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Waiting to finish uploads clearml.Task - INFO - Completed model upload to MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Finished uploadingeverything seems to work (w/ boto3==1.16.2 botocore==1.19.2)

Any thoughts?

  
  
Posted one year ago

CrookedWalrus33 this is odd I tested the exact same code.
I suspect something with the environment maybe?
Whats the python version / OS ? also can you send full pipe freeze?
2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>Yes this is odd, it should add the content-type of the file (for example "application/x-tar" but you are getting None...

  
  
Posted one year ago

what does that actually mean?
2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>

  
  
Posted one year ago

AgitatedDove14 its running inside a docker based worker.
Are you interested in the full pip freeze of that docker?

  
  
Posted one year ago

CrookedWalrus33 I found the issue, this is only failing with Python 3.6.
Let me check something

  
  
Posted one year ago

Actually if you can send the full log of the Task that would be great

  
  
Posted one year ago

Thanks! Let me check something

  
  
Posted one year ago

CrookedWalrus33 can you test what happens if you pass the credentials in the global scope as well, i.e. here:
https://github.com/allegroai/clearml/blob/397dcfacda8f133af0acc7d2f9a124dde38ecc4a/docs/clearml.conf#L80

  
  
Posted one year ago

I think I found something,
https://github.com/allegroai/clearml/blob/e3547cd89770c6d73f92d9a05696018957c3fd62/clearml/storage/helper.py#L1442
What's the boto version you have installed?

  
  
Posted one year ago

can you test what happens if you pass the credentials in the global scope as well, i.e. here:

That didn’t help

  
  
Posted one year ago

Something like:

model = SomePytorchModel()
checkpoint = {'model_state_dict': model.state_dict()}
torch.save(checkpoint, “model.tar”)

  
  
Posted one year ago

AgitatedDove14 ,
From the experiment’s console log:
` - boto3==1.16.2

  • botocore==1.19.2 `
  
  
Posted one year ago

CrookedWalrus33 any chance you can think of a sample code to reproduce?

  
  
Posted one year ago

Thanks ExasperatedCrab78
AgitatedDove14 - attached

  
  
Posted one year ago

(with older clearml versions though…).

Yes, we added content type header for the files when uploading to S3 (so it is easier for users to serve them back). But it seems the python 3.5 casting from Path to str breaks it mimetype call....

  
  
Posted one year ago

Oh wow AgitatedDove14 . Appreciate it!
Are you sure it’s just a matter of the python version?
The same experiment script, was working on the exact docker image in the past (with older clearml versions though…).
For example this experiment log:

  
  
Posted one year ago

Found the issue, fix in the next RC (soon to be out)

  
  
Posted one year ago

Thanks AgitatedDove14 !
I’ll use clearml 1.4.1 until the fix is out.

  
  
Posted one year ago
695 Views
26 Answers
one year ago
one year ago
Tags