Could I Get Some Feedback From People With Experience Using Clearml Pipelines On The Best Way To Handle Caching? My Team Is Working On Configuring Clearml Pipelines For Our Data Processing Workflow. We Currently Have An Experimental Pipeline Configured F

Unanswered

It sounds like you understand the limitations correctly.

As far as I know, it'd be up to you to write your own code that computes the delta between old and new and only re-process the new entries.

The API would let you search through prior experimental results.

so you could load up the prior task, check the ids that showed up in output (maybe you save these as a separate artifact for faster load times), and only evaluate the new inputs. perhaps you copy over the old outputs to the new task for completeness.

that's how I'd approach it. use "data-creation" tasks and artifacts to roll your own logic for "caching" (skipping evaluation) within the task itself.

In the open source version, you don't get a whole lot (in my opinion) from using datasets over basic artifacts in tasks (scoped to just create a dataset). The real "power" in the datasets feature I believe come with some of the pro features.

  				
Posted 
	4 months ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

61 Views

0 Answers

4 months ago