How do I best utilize clearml in this scenario such that any coworker of mine is able to reproduce my work with the same pipeline?
Basically this sounds to me like proper software developemnt design (i.e. the class vs stages).
In order to make sure Anyone can reproduce it, you mean anyone can rerun the "pipeline" ? If this is the case just add Task.init (maybe use a specific Task type) and the agents will make sure this is Fully reproducible.
If you mean the data itself is stored, then you have to store the Datamodule as dataset, and maybe add an argument to your code weather to pull the latest datd from the datasource (i.e. DB?) or use a stored dataset, and in that case pass the dataset UID,
wdyt ?
a pytorchlightning Module with a ClearML task
No need to "specially" combine it. The moment you store the Module in pytorch lighting it is stored in the ClearML model repository, with a pointer to the generating Task (see above, by definition fully repdocubible)
Am I missing something ?