Hi All, I Currently Have Some Data Processing Scripts For Example:

Answered

Hi all,

I currently have some data processing scripts for example:

script_1 -> dataset_0, dataset_1
dataset_1 -> script_2 -> dataset_2
dataset_2 -> script_3 -> dataset_3
dataset_0 + dataset_3 -> script_4 -> dataset_4
As you can see, I have multiple datasets produced from multiple scripts, some of the datasets have multiple dataset as parents.

What is the recommended way to use clearml pipelines? The docs contain an example of how to transform tasks to pipeline steps, however it does not have any examples for datasets.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExuberantBat52
				
					0
					 × 1

Votes Newest

Answers 3

Not exactly, the dataset gets called in the script using Dataset.get() and the second dataset is an output dataset using Dataset.create().. Which means that dataset_1 is a parent dataset of dataset_2.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExuberantBat52
				
					0
					 × 1

The way ClearML thinks about it is the execution graph would be something like:
script_1 -> script_2 -> script_3 ->

Where each script would have in/out, so that you can trace the usage.

Trying to combine the two into a single "execution" graph might not represent the orchestration process.

That said visualizing them could be done.
I mean in theory there is no reason why we could add those "datasets" as other types of building blocks, for visualization purposes only

(Of course this would only make sense if you are creating pipelines)

wdyt?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701168822292480:profile|ExuberantBat52>
What do you mean by:

dataset_1 -> script_2 -> dataset_2a dataset creates a script ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

3 Answers

one year ago