e.g. pseudo for illustration only
` def get_list(dataset_id):
from clearml import Dataset
ds= Dataset.get(dataset_id=dataset_id)
ds_dir=ds.get_local_copy()
etc...
return list_of_objs # one for each file, for example
def pipeline(dataset_id):
list_of_obj = get_list(dataset_id)
list_of_results = []
for obj in list_of_obj:
list_of_results.append(step(obj))
combine(list_of_results) One benefit is being able to make use of the Pipeline caching so if new data were added, adding elements to the list_of_obj, we’d be able to use the cache of the
step ` Task for the old objs. The caching is the main thing but even being able to use the Pipeline interface for this kind of job would be nice as the Pipeline has a lot of nice lineage features.
Where combine
, get_list
and step
are Pipeline steps and pipeline
is the controller