Reputation
Badges 1
26 × Eureka!SweetBadger76 Could you please verify if that is what you meant. I'm still confused if I'm doing something wrong or everything works as intended and Clearml discriminates models only by the file name.
Hi, Erez!
Thank you for the example, I checked it out. It really creates two models. But the thing is, these two models have different file names here. In my scenario, however, it's more convenient for me to have the same file name and different directories for the models. In this case, all my models get overwritten by the latest logged one (as in my screenshot above).
Fortunately, if I use upload_artifact()
instead (which I eventually go with) I manage to achieve what I want (see the s...
Thank you! I'll try it out and let you know the result
Thank you but although I'm actually already using the parameter name
mentioned in your response in my code, I can see only one model on the task's page
Unfortunately, the other parameters like tags
and comment
didn't help to separate the models
filename = './models/v1/model.ckpt' torch.save(state_dict, filename) mv1 = OutputModel(name='model_v1', task=task) mv1.update_weights(filename, upload_uri=my_uri) update_model(mynn.multiplier) state_dict = mynn.state_dict() filename = './models/v2/model.ckpt' torch.save(state_dict, filename) mv2 = OutputModel(name='model_v2', task=task) mv2.update_weights(filename, upload_uri=my_uri)
suggest overwriting them locally?
Yeah, that might be an option but it doesn't have enough flexibility for all my scenarios. E.g. I might need to have different N-numbers for the local and remote (ClearML) storage.
Are you saying we should expose
raise_on_errors
it to _delete_artifacts() function itself?
That'd be a great solution, thanks! I'll create a PR shortly
Tried it, the outcome's still the same though: the artifacts deleted using the task._delete_artifacts() function resurrect on further calls of task.upload_artifact() on new artifacts
Hi, Erez!
Thank you for your answer! I'll see if it solves the problem
clearml 1.3.2
boto3==1.22.7
botocore==1.25.7
I didn’t deploy the server myself but I verified that it works with s3cmd
Did that and still have the same error:Failed creating storage object
Reason: Missing key and secret for S3 storage access (
)
BTW, is it correct to set the files_server
in the api
section?files_server: "
s3://s3.kontur.host:443/srs-clearml "
With this variant of clearml.config I’m now getting a new error:ERROR - Exception encountered while uploading Failed uploading object s3.kontur.host:443/srs-clearml/SpeechLab/ASR/data_logging/test1.1be56a53647646208ffd665908056d49/artifacts/data/valset_2021_02_01_sb_manifest_true_micro.json (405): <?xml version="1.0" encoding="UTF-8"?><Error><Code>MethodNotAllowed</Code><RequestId>tx00000000000000000fc69-0062781afb-eba8e9-default</RequestId><HostId>eba8e9-default-default</HostId></Error>
Do you mean like an example for minio?
Yeah, but with the output_uri
in task initialisation as well. Am I right that in that case it would be like that?output_uri='
s3://my-minio-host:9000/bucket_name '
It’s a self-hosted one. Its address is s3.kontur.host, port 443
If I set it to False I get another error:Failed creating storage object
Reason: Missing key and secret for S3 storage access (
)
Yeah, it holds. I just sent an extract from the config for it to be concise. Here’s the full version
I was just wondering if there’s some valid example of a clearml.conf
containing the correct on-premises s3 settings so that I could use them as a basis?
I assume you have actual values for
key
and
secret
in:
That’s right, I use the same values which work for that bucket with s3cmd
the
secure
flag is
false
I played with this setting as well - didn’t make it work
Well. what’s for sure is that I have the required permissions to write to the bucket, as I manage to upload files into it through s3cmd
and boto3
Probably so, but not sure:( I’ll have to figure it out with our DevOps engineer
I’m also not exactly an expert here, but it must be Ceph if it’s possible to be so
So it’s Ceph (RADOS) Object Gateway in my case
Finally solved it. Turned out it was an authentication issue. In my case, I had to use values for ACCESS_KEY/SECRET other than those which I used with boto3 client