[Pipeline] Am I Right In Saying A Pipeline Controller Can’T Include A Data-Dependent For-Loop? The Issue Is Not Spinning Up The Tasks, It’S Collecting The Results At The End. I Was Trying To Append The Outputs Of Each Iteration Of The For-Loop And Pass Th

Unanswered

Not exactly sure what is going wrong without an exact error or reproducible example.

However, passing around the dataset object is not ideal, because passing info from one step to another in a pipeline requires ClearML to pickle said object and I'm not exactly sure a Dataset obj is picklable.

Next to that, running get_local_copy() in the first step does not guarantee that you can access that data from the other step. Both might be executed in different docker containers or even on different machines.

So for starters I would not pass through the dataobj, but the dataset_id and then get a local copy of it only in step(). The cache should still work with dataset_id as argument too.

I also think there might be limitations to using a for-loop to build a DAG. I think it might not work if you clone the pipeline and change the amount of iterations, but I wouldn't expect an error, just wrong DAG

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

195 Views

0 Answers

one year ago