kkie.. I have two differenet projects under clearml web server.
First project , stores datasets only.. using clearml-data (PROJ_1) Second project, is a clearml-pipeline project, (PROJ_2) which pulls the latest commited dataset from (PROJ_1) and does few other steps ... Now, I manually start the PROJ_2 when i know the dataset is updated in PROJ_1.
With this scenario, your data should be updated when running the pipeline
TimelyPenguin76 :from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")
I mean PROJ_1 gets updated from an external source...
TimelyPenguin76 : Yup that's what I do now.. However, shld config to use some distributed storage later
sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline
MagnificentSeaurchin79 How to do this ? Can it be done via ClearMl itself ?
sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline
how do you load the data inside each task in PROJ_2?
I think so, but I'm not an expert here, I started using this a few weeks ago
take a look at the cleanup service for reference:
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py
Yes, I am already using a Pipeline.
2. I have another project build using the Pipeline. The pipeline always loads the last commited dataset from the above Dataset project and run few other stuff.
Just not sure, how to make the Pipeline to listen to changes in the Dataset project.
Hi DeliciousBluewhale87 ,
How about using the ClearML Pipeline? https://allegro.ai/clearml/docs/docs/examples/pipeline/pipeline_controller.html
Can this do the trick?
Not sure getting that, if you are loading the last dataset task in your experiment task code, it should take the most updated one.
and after with get_local_copy()
?