Hi There, Does Anyone Have Suggestions For Best Practice For Deploying A Pipeline So That It Can Run Remotely On A Clearml Server Using A Docker Image? I Am Finding The Clearml Docs And Videos Insufficient To Get The Pipeline To Actually Run To Completion

Answered

Hi there, does anyone have suggestions for best practice for deploying a pipeline so that it can run remotely on a ClearML server using a Docker image? I am finding the ClearML docs and videos insufficient to get the pipeline to actually run to completion remotely.

Steps I have done:

Written a pipeline with PipelineDecorator with one component and one pipeline function, executing_pipeline() .
In Clearml UI, created two queues - queue-services and queue-forecasting . - In the pipeline code, assigned the pipeline_execution_queue for the controller to queue-services
Assigned the other queue queue-forecasting to the pipeline component using execution_queue parameter.- Added a main.py which calls executing_pipeline() .
Built a docker image with the packages installed (as we use internal packages on Code Artifcact) and specified an entry point main.py .
On the ClearML server, started two agents (workers), both in --docker using the above docker image- One called worker-services in --services-mode and attached to queue queue-services
One called worker-forecasting and attached to queue queue-forecasting- Back in my IDE, I run python main.py and it starts and then switches to remote execution.
BUT, in the ClearML UI:
Under Pipelines, I can see the pipeline says running up until it gets to "Launching step [step_name]" and then just hangs there.
If I go to the Experiments tab I can see the task for this step just stays in "Pending" mode.
Under the Workers and queues tab, I can see the queue queue-forecasting with the worker assigned and under "Next experiment" is the step name. But, nothing happens.
Any tips or ideas?!

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Votes Newest

Answers 3

Ok, thanks! Going to try this now. I included an entry point from reading some other messages on Slack here when trying to figure out how to use Docker for running remotely.

  				
Posted 
	one year ago

					More  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Hey GorgeousShrimp11 can you abort all pending experiments that wait to be fetched from this queue and try again ? Off the top of my head it could be that the clearml-agent can’t pull the custom docker image. In general you should treat the docker images not as step definitions but only as the environment , hence setting the entrypoint is not necessary

  				
Posted 
	one year ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

Which gives me an idea. Could you please remove the entrypoint from the docker image altogether and try again ?

Overriding the entrypoint in the image can lead to docker run/docker exec failing to work properly , because instead of a shell it will use your entrypoint to run everything

  				
Posted 
	one year ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

Write your answer

785 Views

3 Answers

one year ago