Hi AnxiousSeal95 !
That's it. My idea is that artifacts can be linked to the model. Typically these artifacts are often links to serialized objects (such as datasets or scalers). They are usually directories or temporary files in mount units that I want to be loaded as artifacts of the task, removed (as they are temporary) and later I can get a new local path via task.artifacts["scalers"].get_local_copy()
. I think this way the model's dependence on the task that created it could be removed, so that objects associated with the model could be encapsulated inside it
GiganticTurtle0 Got it, makes a lot of sense!
GiganticTurtle0 Hi 🙂
You could try saving them as OutputModel ( https://clear.ml/docs/latest/docs/references/sdk/model_outputmodel ) thus saving them 'outside' of the task object. Regarding if it's considered a good practice or not maybe AnxiousSeal95 can add up on that.
Hi AnxiousSeal95 !
Yes, main reason is to unclutter the ClearML Web UI but also free up space on our server (mainly due to the large size of the datasets). Once the models are trained, I want to retrain them periodically, and to do so I would like all the data specifications and artifacts generated during training to be linked to the model found under the " Models" section.
What I propose is somehow similar to the functionality of clearml.Dataset
. These datasets are themselves a task to which I can attach configuration and artifacts. For example, dates corresponding to a set of samples, feature shapes, etc.
With the current API, if I want to make inference with a model I should keep the training tasks as well (since there resides much of the information associated with the model)
We plan to expand our model object and have searchable key:value dicts associated with it, and maybe metric graphs. What you ask is for us to also add artifacts to it. These artifacts are going to be datasets (or something else?)? If I understand correctly, a key:value would be enough as you're not saving data, but only a links to where the data is. Am I right?
GiganticTurtle0 That is correct. ATM, you can store some things on the model (I think you can hack it by editing the network configuration and storing whatever you want there.
Hi Alejandro, could you elaborate on the use case? Do you want to basically save models and some "info" on them, but remove all experiments? You remove experiments to remove clutter? Or any other reason?
Will you later use the models for something (Retraining \ deployment)?
GiganticTurtle0 So 🙂 had a short chat with one of our R&D guys. ATM, what you're looking for isn't there. What you can do is use OutputModel().update_weights_package(folder_here)
and a folder will be saved with EVERYTHING in it. Now I don't think it would work for you (I assume you want to donwload the model all the time, but artifacts just some times, and don't want to download everything all the time) but it's a hack.
Another option is to use model design field to save links to artifacts.
As for decluttering, why not have a subproject with all the important experiments? This way you don't see it all the time but it's there when you need it.
As for linking artifacts and models. Our idea was to add key:value fields and maybe metrics to models (that way you can query them based on those). But once you add more stuff like artifacts, the information you save on the model is almost like saving the entire experiment and then not sure why not just save the experiments 🙂
Any thoughts on these?
I'll check with R&D if this is the plan or we have something else we planned to introduce and update you
AnxiousSeal95 I see. That's why I was thinking of storing the model inside a task just like with the Dataset
class. So that you can either use just the model via InputModel
or the model and all its artifacts via Task.get_task
by using the ID of the task where the model is located.
I would like my cleanup service to remove all tasks older than two weeks, but not the models. Right now, if I delete all tasks the model does not work (as it needs the training tasks). For now, I will keep the training tasks as a workaround. Even if it is not available now, will it be possible to consider in the near future to include a functionality similar to this? 🙂
And yes, we are going to revisit our assumptions for the model object, adding more stuff to it. Our goal is for it to have just enough info so you can have actionable information (IE, how accurate is it? How fast? How much power does it? How big it is, and other information), but not as comprehensive as a task. something like a lightweight task 🙂 This is one thing we are considering though.
GiganticTurtle0 What about modifying the cleanup service to put all experiments with a specific tag into a subfolder? Then you'll have a subfolder for published experiments (or production models or whatever criteria you want to have 🙂 ). That would declutter your workspace automatically and still retain everything.