e.g. pseudo for illustration only
` def get_list(dataset_id):
from clearml import Dataset
ds= Dataset.get(dataset_id=dataset_id)
ds_dir=ds.get_local_copy()
etc...
return list_of_objs # one for each file, for example
def pipeline(dataset_id):
list_of_obj = get_list(dataset_id)
list_of_results = []
for obj in list_of_obj:
list_of_results.append(step(obj))
combine(list_of_results) One benefit is being able to make use of the Pipeline caching so if new data were added, adding elements to the list_of_obj, we’d be able to use the cache of the step ` Task for the old objs. The caching is the main thing but even being able to use the Pipeline interface for this kind of job would be nice as the Pipeline has a lot of nice lineage features.
Where combine , get_list and step are Pipeline steps and pipeline is the controller