Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi , I Have This Use Case.

Hi , I have this use case.
I have Dataset project. This stores the dataset and its lineage. 2. I have another project build using the Pipeline. The pipeline always loads the last commited dataset from the above Dataset project and run few other stuff.
Currently, I run step 1 and step 2 manually. Is there a way to automate this, such that as long as Dataset project gets updated , Step 2 (Pipeline project) will run as well.

  
  
Posted 3 years ago
Votes Newest

Answers 13


Yes, I am already using a Pipeline.

2.  I have another project build using the Pipeline. The pipeline always loads the last commited dataset from the above Dataset project and run few other stuff.

Just not sure, how to make the Pipeline to listen to changes in the Dataset project.

  
  
Posted 3 years ago

Hi DeliciousBluewhale87 ,

How about using the ClearML Pipeline? https://allegro.ai/clearml/docs/docs/examples/pipeline/pipeline_controller.html
Can this do the trick?

  
  
Posted 3 years ago

With this scenario, your data should be updated when running the pipeline

  
  
Posted 3 years ago

MagnificentSeaurchin79 How to do this ? Can it be done via ClearMl itself ?

sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline

  
  
Posted 3 years ago

TimelyPenguin76 : Yup that's what I do now.. However, shld config to use some distributed storage later

  
  
Posted 3 years ago

sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline

  
  
Posted 3 years ago

TimelyPenguin76 :
from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")

  
  
Posted 3 years ago

I mean PROJ_1 gets updated from an external source...

  
  
Posted 3 years ago

how do you load the data inside each task in PROJ_2?

  
  
Posted 3 years ago

and after with get_local_copy() ?

  
  
Posted 3 years ago

I think so, but I'm not an expert here, I started using this a few weeks ago
take a look at the cleanup service for reference:
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py

  
  
Posted 3 years ago

kkie.. I have two differenet projects under clearml web server.
First project , stores datasets only.. using clearml-data (PROJ_1) Second project, is a clearml-pipeline project, (PROJ_2) which pulls the latest commited dataset from (PROJ_1) and does few other steps ... Now, I manually start the PROJ_2 when i know the dataset is updated in PROJ_1.

  
  
Posted 3 years ago

Not sure getting that, if you are loading the last dataset task in your experiment task code, it should take the most updated one.

  
  
Posted 3 years ago
564 Views
13 Answers
3 years ago
one year ago
Tags