@<1523701083040387072:profile|UnevenDolphin73> : I do not get this impression, because during update_weights
I get the message
2023-02-21 13:54:49,185 - clearml.model - INFO - No output storage destination defined, registering local model C:\Users..._Demodaten_FF_2023-02-21_13-53-51.624362.model
Heh, good @<1523704157695905792:profile|VivaciousBadger56> π
I was just repeating what @<1523701070390366208:profile|CostlyOstrich36> suggested, credits to him
I am not sure if it the fact the name of the file ends with .model
is an issue - but that would be somewhat crazy design...
But, I guess @<1523701070390366208:profile|CostlyOstrich36> wrote that in a different chat, right?
Well you could start by setting the output_uri
to True
in Task.init
.
I can only say Iβve found ClearML to be very helpful, even given the documentation issue.
I think theyβve been working on upgrading it for a while, hopefully something new comes out soon.
Maybe @<1523701205467926528:profile|AgitatedDove14> has further info π
Hi all, sorry for not being so responsive today π
Heh, well, John wrote that in the first reply in this thread π
And in Task.init
main documentation page (nowhere near the code), it says the following -
The documentation is messy, Iβve complained about it the in the past too π
@<1523701070390366208:profile|CostlyOstrich36>
My training outputs a model as a zip file. The way I save and load the zip file to make up my model is custom made (no library is directly used), because we invented the entire modelling ourselves. What I did so far:
output_model = OutputModel(task=..., config_dict={...}, name=f"...")
output_model.update_weights("C:\io__path\...", is_package=True)
and I am trying to load the model in a different Python process with
mymodel = task.models['output'][0]
mymodel = mymodel.get_local_copy(extract_archive=True, raise_on_error=True)
and I get in the clearml cache a .
training.pt file, which seems to be some kind of archive. Inside I have two files named data.pkl
and version
and a folder with the two files named 86922176
and 86934640
.
I am not sure how to proceed after trying to use pickle, zip and joblib. I am kind of at a loss. I suspect, my original zip file might be somehow inside, but I am not sure.
Sure, we could simply use the generic artifacts sdk, but I would like to use the available terminological methods and functions.
How should I proceed?
We're certainly working hard on improving the documentation (and I do apologize for the frustrating experience)
@<1523701083040387072:profile|UnevenDolphin73> : If I do, what should I configure how?
By the way, output_uri is also documented as part of the Task.init() docstring ( None )
@<1523701087100473344:profile|SuccessfulKoala55> : I referenced this conversation in the issue None
@<1523701087100473344:profile|SuccessfulKoala55> Also, I think that - in this case, but also in other cases - the issue is not just the documentation, but also the design of the SDK.
I wouldn't put past ClearML automation (a lot of stuff depend on certain suffixes), but I don't think that's the case here hmm
@<1523704157695905792:profile|VivaciousBadger56> regrading: None
Is this a discussion or PR ?
(general ranting is saved for our slack channel π )
@<1523701087100473344:profile|SuccessfulKoala55> I think I might have made a mistake earlier - but not in the code I posted before. Now, I have the following situation:
- In my training Python process on my notebook I train the custom made model and put it on my harddrive as a zip file. Then I run the code
output_model = OutputModel(task=task, config_dict={...}, name=f"...")
output_model.update_weights(weights_filename=r"C:\path\to\mymodel.zip", is_package=True)
-
I delete the "C:\path\to\mymodel.zip", because it would not be available on my colleagues' computers.
-
In a second process, the model-inference process, I run
mymodel = task.models['output'][-1]
mymodel = mymodel.get_local_copy(extract_archive=True, raise_on_error=True)
and get the error
ValueError: Could not retrieve a local copy of model weights 8ad4db1561474c43b0747f7e69d241a6, failed downloading
I do not have an aws S3 instance or something like that. This is why I would like to store my mymodel.zip file directly on the ClearML Hosted Service. The model is around 2MB large.
How should I proceed?
We'll try to add referenced to that in other places as well π
It is documented at None ... super deep in the code. If you don't know that output_uri
in TASK's (!) init is relevant, you would never know...
Do you mean "exactly" as in "you finally got it" or in the sense of "yes, that was easy to miss"?
FWIW, we prefer to set it in the agentβs configuration file, then itβs all automatic
@<1523701083040387072:profile|UnevenDolphin73>
@<1523701087100473344:profile|SuccessfulKoala55> : That is the link I posted as well. But this should be mentioned also at places where it is about about the external or non-external storage. Also it should be mentioned everywhere we talk about models or artifacts etc. Not necessarily in details, but at least with a sentence and a link.
@<1523701083040387072:profile|UnevenDolphin73> : How do you figure? In the past, my colleagues and I just shared the .zip
file via email / MS Teams and it worked. So I don't think so.