kkie.. I have two differenet projects under clearml web server.
First project , stores datasets only.. using clearml-data (PROJ_1) Second project, is a clearml-pipeline project, (PROJ_2) which pulls the latest commited dataset from (PROJ_1) and does few other steps ... Now, I manually start the PROJ_2 when i know the dataset is updated in PROJ_1.
I think so, but I'm not an expert here, I started using this a few weeks ago
take a look at the cleanup service for reference:
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py
TimelyPenguin76 : Yup that's what I do now.. However, shld config to use some distributed storage later
Hi DeliciousBluewhale87 ,
How about using the ClearML Pipeline? https://allegro.ai/clearml/docs/docs/examples/pipeline/pipeline_controller.html
Can this do the trick?
With this scenario, your data should be updated when running the pipeline
sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline
I mean PROJ_1 gets updated from an external source...
Yes, I am already using a Pipeline.
2. I have another project build using the Pipeline. The pipeline always loads the last commited dataset from the above Dataset project and run few other stuff.
Just not sure, how to make the Pipeline to listen to changes in the Dataset project.
and after with get_local_copy()
?
Not sure getting that, if you are loading the last dataset task in your experiment task code, it should take the most updated one.
how do you load the data inside each task in PROJ_2?
TimelyPenguin76 :from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")
MagnificentSeaurchin79 How to do this ? Can it be done via ClearMl itself ?
sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline