But, I guess @<1523701070390366208:profile|CostlyOstrich36> wrote that in a different chat, right?
@<1523701083040387072:profile|UnevenDolphin73> : How do you figure? In the past, my colleagues and I just shared the .zip file via email / MS Teams and it worked. So I don't think so.
Well you could start by setting the output_uri to True in Task.init .
Hi @<1523704157695905792:profile|VivaciousBadger56> , you can configure Task.init(..., output_uri=True) and this will save the models to the clearml file server
It should store it on the fileserver, perhaps you're missing a configuration option somewhere?
I wouldn't put past ClearML automation (a lot of stuff depend on certain suffixes), but I don't think that's the case here hmm
@<1523701087100473344:profile|SuccessfulKoala55> I think I might have made a mistake earlier - but not in the code I posted before. Now, I have the following situation:
- In my training Python process on my notebook I train the custom made model and put it on my harddrive as a zip file. Then I run the code
output_model = OutputModel(task=task, config_dict={...}, name=f"...")
output_model.update_weights(weights_filename=r"C:\path\to\mymodel.zip", is_package=True)
-
I delete the "C:\path\to\mymodel.zip", because it would not be available on my colleagues' computers.
-
In a second process, the model-inference process, I run
mymodel = task.models['output'][-1]
mymodel = mymodel.get_local_copy(extract_archive=True, raise_on_error=True)
and get the error
ValueError: Could not retrieve a local copy of model weights 8ad4db1561474c43b0747f7e69d241a6, failed downloading
I do not have an aws S3 instance or something like that. This is why I would like to store my mymodel.zip file directly on the ClearML Hosted Service. The model is around 2MB large.
How should I proceed?
Heh, good @<1523704157695905792:profile|VivaciousBadger56> 😁
I was just repeating what @<1523701070390366208:profile|CostlyOstrich36> suggested, credits to him
By the way, output_uri is also documented as part of the Task.init() docstring ( None )
@<1523704157695905792:profile|VivaciousBadger56> regrading: None
Is this a discussion or PR ?
(general ranting is saved for our slack channel 🙂 )
Hi all, sorry for not being so responsive today 🙏
@<1523701083040387072:profile|UnevenDolphin73>
@<1523701087100473344:profile|SuccessfulKoala55> : I referenced this conversation in the issue None
Do you mean "exactly" as in "you finally got it" or in the sense of "yes, that was easy to miss"?
I am not sure if it the fact the name of the file ends with .model is an issue - but that would be somewhat crazy design...
I can only say I’ve found ClearML to be very helpful, even given the documentation issue.
I think they’ve been working on upgrading it for a while, hopefully something new comes out soon.
Maybe @<1523701205467926528:profile|AgitatedDove14> has further info 🙂
It is documented at None ... super deep in the code. If you don't know that output_uri in TASK's (!) init is relevant, you would never know...
Yes, you're correct, I misread the exception.
Maybe it hasn't completed uploading? At least for Datasets one needs to explicitly wait IIRC
I have already been trying to contribute (have three pull requests), but honestly I feel it is a bit weird, that I need to update a documentation about something I do not understand, while I actually try to evaluate if ClearML is the right tool for our company...
@<1523701070390366208:profile|CostlyOstrich36>
My training outputs a model as a zip file. The way I save and load the zip file to make up my model is custom made (no library is directly used), because we invented the entire modelling ourselves. What I did so far:
output_model = OutputModel(task=..., config_dict={...}, name=f"...")
output_model.update_weights("C:\io__path\...", is_package=True)
and I am trying to load the model in a different Python process with
mymodel = task.models['output'][0]
mymodel = mymodel.get_local_copy(extract_archive=True, raise_on_error=True)
and I get in the clearml cache a . training.pt file, which seems to be some kind of archive. Inside I have two files named data.pkl and version and a folder with the two files named 86922176 and 86934640 .
I am not sure how to proceed after trying to use pickle, zip and joblib. I am kind of at a loss. I suspect, my original zip file might be somehow inside, but I am not sure.
Sure, we could simply use the generic artifacts sdk, but I would like to use the available terminological methods and functions.
How should I proceed?
FWIW It’s also listed in other places @<1523704157695905792:profile|VivaciousBadger56> , e.g. None says:
In order to make sure we also automatically upload the model snapshot (instead of saving its local path), we need to pass a storage location for the model files to be uploaded to.
For example, upload all snapshots to an S3 bucket…
We're certainly working hard on improving the documentation (and I do apologize for the frustrating experience)
FWIW, we prefer to set it in the agent’s configuration file, then it’s all automatic
@<1523701083040387072:profile|UnevenDolphin73> : I see. I did not make the connection that output_uri=True is what I was missing. I thought this was the default. But the default is actually "None", which is different than "True".
@<1523701087100473344:profile|SuccessfulKoala55> Also, I think that - in this case, but also in other cases - the issue is not just the documentation, but also the design of the SDK.
We have the following, works fine (we also use internal zip packaging for our models):
model = OutputModel(task=self.task, name=self.job_name, tags=kwargs.get('tags', self.task.get_tags()), framework=framework)
model.connect(task=self.task, name=self.job_name)
model.update_weights(weights_filename=cc_model.save())
@<1523701083040387072:profile|UnevenDolphin73> : Thanks, but it does not mention the File Storage of "ClearML Hosted Server".