How do I best utilize clearml in this scenario such that any coworker of mine is able to reproduce my work with the same pipeline?
Basically this sounds to me like proper software developemnt design (i.e. the class vs stages).
In order to make sure Anyone can reproduce it, you mean anyone can rerun the "pipeline" ? If this is the case just add Task.init (maybe use a specific Task type) and the agents will make sure this is Fully reproducible.
If you mean the data itself is stored, then you have to store the Datamodule as dataset, and maybe add an argument to your code weather to pull the latest datd from the datasource (i.e. DB?) or use a stored dataset, and in that case pass the dataset UID,
wdyt ?
a pytorchlightning Module with a ClearML task
No need to "specially" combine it. The moment you store the Module in pytorch lighting it is stored in the ClearML model repository, with a pointer to the generating Task (see above, by definition fully repdocubible)
Am I missing something ?
Hi @<1547028031053238272:profile|MassiveGoldfish6>
What is the use case? the gist is you want each component to be running on a different machine. and you want to have clearml do the routing of data and logic between.
How would that work in your use case?
I would like to implement MLOPS best practices to my project.
So in my Datamodule class, i would load the clearml data and prep it into train and test. In the lightning module class, i would create my model, and finally use trainer class to train.
How do I best utilize clearml in this scenario such that any coworker of mine is able to reproduce my work with the same pipeline?
in other words, how do you combine a pytorchlightning Module with a ClearML task?