I think I found something,
https://github.com/allegroai/clearml/blob/e3547cd89770c6d73f92d9a05696018957c3fd62/clearml/storage/helper.py#L1442
What's the boto version you have installed?
CrookedWalrus33 any chance you can think of a sample code to reproduce?
(with older clearml versions though…).
Yes, we added content type header for the files when uploading to S3 (so it is easier for users to serve them back). But it seems the python 3.5 casting from Path to str breaks it mimetype call....
CrookedWalrus33 I'm testing with the latest RC on a local minio and this is what I'm getting:clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_3by281j8.tmp => 10.99.0.188:9000/bucket/debug/PyTorch MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Waiting to finish uploads clearml.Task - INFO - Completed model upload to
MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Finished uploading
everything seems to work (w/ boto3==1.16.2 botocore==1.19.2)
Any thoughts?
SweetBadger76
It’s not a credential issue, because I do upload artifacts manually with tsk.upload_artifact(...)
I’ll try changing the extension, but I have to admit that in the past (I havn’t used clearml for a while and updated it recently to the latest verion) it did got this file extension right
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
Something like:
model = SomePytorchModel()
checkpoint = {'model_state_dict': model.state_dict()}
torch.save(checkpoint, “model.tar”)
CrookedWalrus33 can you test what happens if you pass the credentials in the global scope as well, i.e. here:
https://github.com/allegroai/clearml/blob/397dcfacda8f133af0acc7d2f9a124dde38ecc4a/docs/clearml.conf#L80
it works locally and not on a remote exec : can you check that the machine that the agent if executed from is correctly configured ? the agent there needs to be provided with the correct credentials the autolog uses the file extension to determine what it is reporting. can you try to use the regular .pt extension ?
CrookedWalrus33 I found the issue, this is only failing with Python 3.6.
Let me check something
can you test what happens if you pass the credentials in the global scope as well, i.e. here:
That didn’t help
can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?
If you could also give your clearml packages versions it could help 🙂
Did you by any chance save the checkpoint without any file extention? Or with a weird name containing slashes or points? The error seems to suggest the content type was not properly parsed
Thanks ExasperatedCrab78
AgitatedDove14 - attached
AgitatedDove14 , did you test it using a worker, or with local execution?
I just tested https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py with a (docker based) worker and it yields the same error2022-07-17 07:59:40,330 - clearml.Task - INFO - Waiting to finish uploads 2022-07-17 07:59:40,330 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_0_4d_ikk.tmp => tapsff.local:9000/clearml/examples/PyTorch MNIST train.02ed1df11bf546d8bb1938610b352803/models/mnist_cnn.pt 2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'> 2022-07-17 07:59:40,339 - clearml.storage - ERROR - Exception encountered while uploading Upload failed 2022-07-17 07:59:40,339 - clearml.Task - INFO - Failed model upload 2022-07-17 10:59:52 2022-07-17 07:59:49,651 - clearml.Task - INFO - Finished uploading
Found the issue, fix in the next RC (soon to be out)
can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?
If you could also give your clearml packages versions it could help
I store the artifacts on a minio server (in my LAN).
If I run the python script locally (i.e. no execute remotely()
it works fine).
I use the latest clearml 1.6.2
Did you by any chance save the checkpoint without any file extention? Or with a weird name containing slashes or points? The error seems to suggest the content type was not properly parsed
Checkpoint name is “model_pos_vga_smallBB.tar”
AgitatedDove14 ,
From the experiment’s console log:
` - boto3==1.16.2
- botocore==1.19.2 `
CrookedWalrus33 this is odd I tested the exact same code.
I suspect something with the environment maybe?
Whats the python version / OS ? also can you send full pipe freeze?2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>
Yes this is odd, it should add the content-type of the file (for example "application/x-tar" but you are getting None...
what does that actually mean?2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>
Thanks AgitatedDove14 !
I’ll use clearml 1.4.1 until the fix is out.
AgitatedDove14 its running inside a docker based worker.
Are you interested in the full pip freeze of that docker?
Actually if you can send the full log of the Task that would be great
Oh wow AgitatedDove14 . Appreciate it!
Are you sure it’s just a matter of the python version?
The same experiment script, was working on the exact docker image in the past (with older clearml versions though…).
For example this experiment log: