ClearML FAQ | Hey, I’M Thinking Of Using A Clearml Pipeline To Compile A Dataset More Efficiently. My Hope Is That I Won’T Have To Run

Unanswered

Hey, I’M Thinking Of Using A Clearml Pipeline To Compile A Dataset More Efficiently. My Hope Is That I Won’T Have To Run

Hey, I’m thinking of using a ClearML Pipeline to compile a dataset more efficiently.

My hope is that I won’t have to run every step for every data point every time, as the dataset is big and some of the steps are intensive etc.
I am at a stage where I will be switching out models and algorithms rapidly to try and find the best combinations, and adding / removing Tasks (e.g. to create new Features), so it’s important to me that the process of compiling the dataset is as quick and traceable as possible.

How would I set up a ClearML Pipeline/Tasks (Pipeline components) such that:
If the Task has been run before with the same code & model & input data, the Task is not run again and instead cached outputs (e.g. features) are passed onto the next Task(s) in the Pipeline If code or model for a Task has been updated, all input data are processed (with the results being passed on to downstream Task(s)) If code or model for a Task has not changed but some input data has changed, only run the Task on the new input data, then combine the newly processed outputs with the (correct) previously-computed+cached outputs If new Tasks are added to the Pipeline, (e.g. adding the requisite Tasks to create a new Feature in the final CSV), the old Tasks should still function as in 1, 2 and 3
Is there a good way to do this?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ReassuredOwl55
				
					0
					 × 1

Write your answer

2K Views

0 Answers

2 years ago

2 years ago

Tags

Similar posts

I Was Working With Pipelines And Just Wanted To Know If Using Clearml Data Is A Better Way To Store Any Task Outputs Compared To Artifacts. Also For Example If I Have 3 Tasks -> Preprocessing -> Training -> Eval. Is It Better To Store The Output Model In

Hi All, I'M Generating Pipelines From Tasks And There'S A Step That Requires Providing Outputs From The Parent Task To The Child, But I'D Like To Pass It As A Single List Of Values. I Was Planning To Use Parameter_Override To Pass A List Of Values, Each

Hi Everyone! I Have A Question Regarding A Specific Use Case For Tasks. To Run Hyperparam Optimization I Have A Function That Evaluates A Model On A Bunch Of Videos And Outputs A Metric. I Would Like To Log Somewhere The Results, So That I Can Then Easil

Hi All! I Have A Question - I Have A Pipeline Build Out Of Multiple Tasks. One Of The Tasks Creates A File As An Output. I Don'T Need To Save This File Anywhere (Not In My Local Nor As An Artifact), But I Do Want To Use It An An Input For The Next Step In

[Pipeline] Am I Right In Saying A Pipeline Controller Can’T Include A Data-Dependent For-Loop? The Issue Is Not Spinning Up The Tasks, It’S Collecting The Results At The End. I Was Trying To Append The Outputs Of Each Iteration Of The For-Loop And Pass Th

[Clearml Task Querying] How Would I Find Tasks That Have The Same Code With Different Inputs/Parameters? I’M Interested In “Diff”Ing The Inputs To/Outputs From A Task To Do Pipeline “Caching” In A More Intelligent Way (For My Use Case) Than Clearml Does B

Hi All, I'M Running Some Training Tasks Remotely Using Clearml Agent, Using My Github Repo Code. One Of The Input Arguments To My Process Is A Config Yaml File. The Pipeline Is Setup So That This Config File Is Referenced With A Relative Path Such As