Hi, We Are Using Clearml For Our Experiment Tracking But Now Investigating Using The Pipeline Functionality As Well For Scheduling. We Also Want To Be Able To Trigger A Pipeline Run When There Is New Data In An External Database. Is This Possible? From Wh

Answered

Hi, we are using ClearML for our experiment tracking but now investigating using the pipeline functionality as well for scheduling. We also want to be able to trigger a pipeline run when there is new data in an external database. Is this possible? From what I can see the trigger works for changes in ClearML datasets for example: None Is there a way to trigger a pipeline based on some external change that we can monitor?

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Votes Newest

Answers 14

Hi GorgeousShrimp11 , long story short - you can.

Now to delve into it a bit - You can trigger entire pipeline runs via the API.

I can think of two options from the top of my head. First being some sort of "service" task running constantly and listening to something and then triggering pipeline runs.

The second, some external source sending an POST request via API to trigger a pipeline.

What do you think?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Hi GorgeousShrimp11

can you run a pipeline on a

schedule

or are schedules only for Tasks?

I think one tiny details got lost here, Pipelines (the logic driving them) are a type of Task, this means you can clone and enqueue them like other tasta
(Task.enqueue / Task.clone)
Other than that looks good to me, did I miss anything ?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

. Looking at this example here, it looks like it only works with tasks:

Aha! Pipeline is a Task 🙂 (a specific type of Task, nonetheless a Task)
Just use the pipeline ID, and make sure you push it into the services queue, voila

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I am using the pipeline id of when I last ran the pipeline and got this through the UI in ClearML.

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

CostlyOstrich36 , a quick follow up, I've been looking at the ClearML API documentation to see how to trigger a pipeline via the API. Do you use queues and add_task , as specified here: None ?

Here is an example of the pipeline code, simplified:

"""Forecasting Pipeline"""

from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(cache=True, task_type=TaskTypes.data_processing)
def project_pipeline(config_path: str):
    """
    Pipeline steps

    Args:
        config_path (str): Path to config file
    """

    from clearml_pipeline.modeling_utils import generate_predictions
    from loguru import logger

    try:
        results = generate_predictions(config_path)

    except Exception as e:
        logger.error(f"{e}")


@PipelineDecorator.pipeline(
    name="pipeline", project="project_name", version="0.0.1"
)
def executing_pipeline(config_path: str):
    """Decorator for executing the pipeline"""

    project_pipeline(config_path)


if __name__ == "__main__":

    PipelineDecorator.run_locally()

    executing_pipeline("clearml_pipeline/config/ml_config.yaml")

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Just use the pipeline ID, and make sure you push it into the services queue, voila

AgitatedDove14 A somewhat related question - why is pushing into the services queue required as opposed to just pushing it into other queues? I have had experience where triggering a pipeline would not show up under the Pipelines tab in the web UI - it just shows up in Projects. Wondering if the queue matters for this.

  				
Posted 
	one year ago

					More  		
  Report
		
					GreasyKitten62
				
					0
					 × 1

why is pushing into the services queue required ...

The services queue is usually connected with an agent running in "services mode" which means this agent is executing multiple tasks in parallel (as opposed to regular agent that only launches one Task at a time, the assumption is that "service" Tasks are usually not heavy on cpu/ram so multiple instances make sense)

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

oh the pipeline logic itself holds one "job" on the worker, and this is why you do not have any other spare workers to run the components of the pipeline.
Run your worker with --services-mode , it will launch multiple Tasks at the same time, it should solve the issue

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi AgitatedDove14 , thanks for your reply. We want to use the Scheduler but to run a pipeline. Looking at this example here, it looks like it only works with tasks: None

So, in my code example above, where I have executing_pipeline as the pipeline function created with the decorator, can this be scheduled to run with the TaskScheduler , ie. used as the function in this line? None At the moment, we can't get this to work or figure out how to use the Scheduler with pipelines.

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

AgitatedDove14 Hi! Could you give any feedback on the above? We are trying to figure out if/how we can run pipelines on a schedule and also trigger them with an external event.

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Thanks CostlyOstrich36 , that does sound like an option. Can you point me to the documentation for this API call as I haven't been able to find anything?

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Hi CostlyOstrich36 , another quick question, can you run a pipeline on a schedule or are schedules only for Tasks? We are battling to figure out how to automate the pipelines.

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Hi AgitatedDove14 , I'm still having issues with this set up. See my latest comment here: None

I created a new queue megan-testing and have an agent running on my machine that I assigned to it. It works when I just use a simple task and schedule it, but when I try run the pipeline, it says it can't find the queue.

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Following up, I found this in the code on Github: None

A taskid is required though - to get this id would we run the pipeline manually as it then shows up in the Web UI and then just use the id of the task that we can get from the UI by clicking on the pipeline run info?

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Write your answer

975 Views

14 Answers

one year ago