Hi , I Have This Use Case.

Answered

Hi , I have this use case.
I have Dataset project. This stores the dataset and its lineage. 2. I have another project build using the Pipeline. The pipeline always loads the last commited dataset from the above Dataset project and run few other stuff.
Currently, I run step 1 and step 2 manually. Is there a way to automate this, such that as long as Dataset project gets updated , Step 2 (Pipeline project) will run as well.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					DeliciousBluewhale87
				
					0
					 × 1

Votes Newest

Answers 13

Yes, I am already using a Pipeline.

2. I have another project build using the Pipeline. The pipeline always loads the last commited dataset from the above Dataset project and run few other stuff.

Just not sure, how to make the Pipeline to listen to changes in the Dataset project.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					DeliciousBluewhale87
				
					0
					 × 1

I mean PROJ_1 gets updated from an external source...

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					DeliciousBluewhale87
				
					0
					 × 1

Hi DeliciousBluewhale87 ,

How about using the ClearML Pipeline? https://allegro.ai/clearml/docs/docs/examples/pipeline/pipeline_controller.html
Can this do the trick?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

I think so, but I'm not an expert here, I started using this a few weeks ago
take a look at the cleanup service for reference:
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MagnificentSeaurchin79
				
					0
					 × 1

MagnificentSeaurchin79 How to do this ? Can it be done via ClearMl itself ?

sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					DeliciousBluewhale87
				
					0
					 × 1

TimelyPenguin76 :
from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					DeliciousBluewhale87
				
					0
					 × 1

TimelyPenguin76 : Yup that's what I do now.. However, shld config to use some distributed storage later

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					DeliciousBluewhale87
				
					0
					 × 1

Not sure getting that, if you are loading the last dataset task in your experiment task code, it should take the most updated one.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MagnificentSeaurchin79
				
					0
					 × 1

kkie.. I have two differenet projects under clearml web server.
First project , stores datasets only.. using clearml-data (PROJ_1) Second project, is a clearml-pipeline project, (PROJ_2) which pulls the latest commited dataset from (PROJ_1) and does few other steps ... Now, I manually start the PROJ_2 when i know the dataset is updated in PROJ_1.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					DeliciousBluewhale87
				
					0
					 × 1

how do you load the data inside each task in PROJ_2?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

With this scenario, your data should be updated when running the pipeline

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

and after with get_local_copy() ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Write your answer

2K Views

13 Answers

4 years ago

2 years ago