AgitatedDove14 Not sure the pipeline decorator is what I need.
Here’s a very simplified example to my question.
Say I want to train my model on some data.
Before adding http://clear.ml , the code looks something like:def train(data_dir, ...): ...
Now I want to leverage the data versioning capability in http://clear.ml
So now, the code needs to fetch dataset by ID, save it locally, and let the model train on it as before:from clearml import Dataset def train_clearml(dataset_id): ds = Dataset.get(dataset_id) self.base_dir = ds.get_local_copy() train(self.base_dir)
Another example would be to report artifacts like the output model from my training code.
Here, the http://clear.ml decorates the call to train() and needs to receive and report the output model.
I want to achieve the following:
Be able to trigger the “pure” function (e.g. train()) locally, without any http://clear.ml code running, while driving it from a configuration e.g. path to the data. Be able to trigger the “ http://clear.ml decorator” (e.g. train_clearml()) while driving it from configuration e.g. dataset_id Keep the code of (2) completely separate from the code to (1) - purely write it in separate modules where (2) delegates to (1) but not the other way around
While this is a simple example, it shows the following dilemmas:
How to maintain a configuration variant with and without the http://clear.ml code How to globally select if I am running with or without http://clear.ml How to design the interface between the http://clear.ml wrapper and the pure ML code when it needs to report artefacts
This gets more complex if the workflow includes inputs, outputs, scalars, and other goodies are being reported and fetched from http://clear.ml all along the workflow…
I was wondering
if the community has encountered these concerns and whether there are some recommended best practices for achieving the above separation? Or alternatively - if people feel that this is an anti-requirement and feel it’s better to continue writing http://clear.ml code mixed in with their pure ML code?
Hope I managed to clarify my question 🙂