Hey Team, I Had A Question About Executing The Pipelines Using Clearml Agent Setup In K8S. When We Define A Pipeline Script To Be Executing Remotely And Submit It To The Queue For Execution We See Following Things Happening

Answered

Hey team, I had a question about executing the pipelines using clearml agent setup in K8s. When we define a pipeline script to be executing remotely and submit it to the queue for execution we see following things happening

User runs the pipeline script with stepA and stepB locally where it creates the pipeline and the controller task with task uuid say uuidA and submits it to the queue.
ClearML also captures the pipeline arguments through some inspection (we use click mostly) and submits it along with the pipeline.
The clearml agent running in k8s pod picks up the submitted pipeline script from the services queue and starts a worker pod which runs this same controller task uuid uuidA and executes the pipeline script remotely.
It starts the worker pods that executes the individual two steps in the pipeline and these have their own task and task uuid i.e uuidStepA and uuidStepBSo I guess the question is, why does ClearML need to run the pipeline script twice i.e once when we submit it locally from our machines and once remotely in K8s? is there any specific reason for this design?

  				
Posted 
	7 days ago

					More  		
  Report
		
					EmbarrassedBlackbird33
				
					0
					 × 1

Votes Newest

Answers 2

From what we have seen, In order to submit the pipeline to ClearML.
Clearml processes the pipeline script locally and submits it to the queue. What happens locally is that is creates a controller task (task that orchestrates the pipeline I guess) and records the arguments of the script that needs to execute as part of this task i.e the pipeline script

Now once it is submitted to the queue, a new worker pod is spin up that continues this controller task (that was created in ClearML when the script was processes locally on my machine) and processes the entire script again with the same captures argument.
Is there a particular reason for it to process it twice, once locally on my machine and another remotely in the worker pod. When it processes locally, it can identify the pipeline and the steps so why does it need to do it remotely. Is this purely for orchestration reasons?

  				
Posted 
	6 days ago

					More  		
  Report
		
					EmbarrassedBlackbird33
				
					0
					 × 1

Hi EmbarrassedBlackbird33 , I'm not sure I understand. You don't need to run the script twice. You simply need to create the pipeline via code once and then you can run it as much as you want remotely

  				
Posted 
	6 days ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

44 Views

2 Answers

7 days ago

6 days ago