Reputation
Badges 1
30 × Eureka!Unfortunately, the other parameters like tags
and comment
didn't help to separate the models
If I set it to False I get another error:Failed creating storage object
Reason: Missing key and secret for S3 storage access (
)
So it’s Ceph (RADOS) Object Gateway in my case
Probably so, but not sure:( I’ll have to figure it out with our DevOps engineer
It’s a self-hosted one. Its address is s3.kontur.host, port 443
Thank you! I'll try it out and let you know the result
suggest overwriting them locally?
Yeah, that might be an option but it doesn't have enough flexibility for all my scenarios. E.g. I might need to have different N-numbers for the local and remote (ClearML) storage.
Are you saying we should expose
raise_on_errors
it to _delete_artifacts() function itself?
That'd be a great solution, thanks! I'll create a PR shortly
the
secure
flag is
false
I played with this setting as well - didn’t make it work
Do you mean like an example for minio?
Yeah, but with the output_uri
in task initialisation as well. Am I right that in that case it would be like that?output_uri='
s3://my-minio-host:9000/bucket_name '
clearml 1.3.2
boto3==1.22.7
botocore==1.25.7
I didn’t deploy the server myself but I verified that it works with s3cmd
Yeah, it holds. I just sent an extract from the config for it to be concise. Here’s the full version
With this variant of clearml.config I’m now getting a new error:ERROR - Exception encountered while uploading Failed uploading object s3.kontur.host:443/srs-clearml/SpeechLab/ASR/data_logging/test1.1be56a53647646208ffd665908056d49/artifacts/data/valset_2021_02_01_sb_manifest_true_micro.json (405): <?xml version="1.0" encoding="UTF-8"?><Error><Code>MethodNotAllowed</Code><RequestId>tx00000000000000000fc69-0062781afb-eba8e9-default</RequestId><HostId>eba8e9-default-default</HostId></Error>
Finally solved it. Turned out it was an authentication issue. In my case, I had to use values for ACCESS_KEY/SECRET other than those which I used with boto3 client
I was just wondering if there’s some valid example of a clearml.conf
containing the correct on-premises s3 settings so that I could use them as a basis?
More precisely, I'm using Llama factory and I'm running train scripts from there it, like python train.py ...
, without editing them. Therefore I can't create a Clearml Task inside this process to record the experiment to. Of course I can manually add all the parameters, metrics and artifacts afterwards, but ideally, I'd like to be able to have real-time logs of my Llama-factory-experiment in Clearml. The package has integrations wit...
Hi, Jake!
Thanks for your response! I just managed to solve the problem by running my train CLI command in a subprocess and creating a thread to capture the stdout from this subprocess and send it to a Clearml Task. The solution doesn/t even seem too ugly as I was afraid it would be 😀
Has anyone done something similar? How did you manage to track real-time data about the experiment to Clearml?
SweetBadger76 Could you please verify if that is what you meant. I'm still confused if I'm doing something wrong or everything works as intended and Clearml discriminates models only by the file name.
Thank you but although I'm actually already using the parameter name
mentioned in your response in my code, I can see only one model on the task's page
Hi, Erez!
Thank you for the example, I checked it out. It really creates two models. But the thing is, these two models have different file names here. In my scenario, however, it's more convenient for me to have the same file name and different directories for the models. In this case, all my models get overwritten by the latest logged one (as in my screenshot above).
Fortunately, if I use upload_artifact()
instead (which I eventually go with) I manage to achieve what I want (see the s...
filename = './models/v1/model.ckpt' torch.save(state_dict, filename) mv1 = OutputModel(name='model_v1', task=task) mv1.update_weights(filename, upload_uri=my_uri) update_model(mynn.multiplier) state_dict = mynn.state_dict() filename = './models/v2/model.ckpt' torch.save(state_dict, filename) mv2 = OutputModel(name='model_v2', task=task) mv2.update_weights(filename, upload_uri=my_uri)
BTW, is it correct to set the files_server
in the api
section?files_server: "
s3://s3.kontur.host:443/srs-clearml "
Hi, Erez!
Thank you for your answer! I'll see if it solves the problem
I’m also not exactly an expert here, but it must be Ceph if it’s possible to be so
Tried it, the outcome's still the same though: the artifacts deleted using the task._delete_artifacts() function resurrect on further calls of task.upload_artifact() on new artifacts
Did that and still have the same error:Failed creating storage object
Reason: Missing key and secret for S3 storage access (
)
Well. what’s for sure is that I have the required permissions to write to the bucket, as I manage to upload files into it through s3cmd
and boto3
Hi, Eugen!
Thanks for the reference, I'll check it out
I assume you have actual values for
key
and
secret
in:
That’s right, I use the same values which work for that bucket with s3cmd