Answered

Hey, I Have A Question Regarding Pipelines. Let'S Say I Have 2 Scripts: Train.Py And Evaluate.Py. Each Of Them Creates A Task Using Task.Init And Logs Some Information. These Scripts Are Run Independently (In My Case They Are Run By Dvc). I Would Like Bot

Hey, I have a question regarding pipelines. Let's say I have 2 scripts: train.py and evaluate.py. Each of them creates a task using Task.init and logs some information. These scripts are run independently (in my case they are run by DVC). I would like both of them to be logged under a single pipeline. Is there a way to do it without having to define PipelineController, adding tasks to it and executing it through it? Something like "continue with this already created pipeline and add the currently run task to it".

I'm trying to connect my existing DVC pipeline to ClearML (to use its tracking functionalities), if someone has some experience with it, let me know 🙂 .

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ScaryLeopard77
				
					0
					 × 1

Votes Newest

Answers 8

We just do task.close() and then start a new task.Init() manually, so our "pipelines" are self-controlled

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

ScaryLeopard77 , Hi! Is there a specific reason to the aversion from pipelines? What is the use case?

"continue with this already created pipeline and add the currently run task to it"

I'm not sure I understand, can you please elaborate? (I'm pretty sure it's a pipelines feature)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

The idea is that I first call script start_new_pipeline.py , which should just create the pipeline and then I call scripts train_pipeline.py and evaluate_pipeline.py which contain the tasks that should belong to the pipeline. However I don't know how start_new_pipeline.py should look like so that the following tasks would belong the created pipeline.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ScaryLeopard77
				
					0
					 × 1

Hi ScaryLeopard77
You can probably do:
Task.init(...,continue_last_task='task_id_here')This will continue a previously executed Task and log both steps in the same place.
Does that help?
BTW: you can also of course manually report to any Task as it is still running with:
aux_task = Task.get_task(task_id_here) aux_task.get_logger().report_scalar(...)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi AgitatedDove14 .
That way I loose some execution information, only the execution information from last Task stays logged. That's why I want to keep it as separate tasks under a single pipeline.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ScaryLeopard77
				
					0
					 × 1

That's why I want to keep it as separate tasks under a single pipeline.

Hmm Yes, if this is the case then you definitely have to have two Tasks (with execution info on each one).
So you could just create a "draft" pipeline Task and report everything to it? Does that make sense ?
(By design a pipeline is in charge of spinning the Tasks and pulling the data/metric from them if needed, in your case it sounds like you need the Tasks to push the data/metric onto the pipeline Task, this is actually doable).
So the flow can be:
Create pipeline Task (draft) Pass the pipeline Task ID to the "steps" Have the steps report to the "pipeline" TaskDoes that make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I kind of understand the first step -> create the pipeline task, keep it in draft state and save its ID. How do you though pass the ID to the following tasks and have them report to the pipeline (parent) task?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ScaryLeopard77
				
					0
					 × 1

Pseudo-ish code:
create pipelinepipeline = Task.create(..., task_type="controller") pipeline.mark_started() print(pipeline.id)2. launch step A (pass arguments via command line argument / os environment)
` task = Task.init(...)
pipeline_id = os.environ['MY_MAIN_PIPELINE']
pipeline_task = Task.get_task(task_id=pipeline_id)

send some metrics / reports etc.

pipeline_task.get_logger().report_scalar(...)
pipeline_task.get_logger().report_text(...) `wdyt? (obvioudly you need to somehow pass the pipeline task id to the steps, I'm not sure I understand how you actually launch these steps, but I'm assuming this is doable)
BTW: why not just use clearml-agent for launching the steps ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

8 Answers

3 years ago

2 years ago