Hi There, Does Anyone Have Suggestions For Best Practice For Deploying A Pipeline So That It Can Run Remotely On A Clearml Server Using A Docker Image? I Am Finding The Clearml Docs And Videos Insufficient To Get The Pipeline To Actually Run To Completion

Answered

Hi there, does anyone have suggestions for best practice for deploying a pipeline so that it can run remotely on a ClearML server using a Docker image? I am finding the ClearML docs and videos insufficient to get the pipeline to actually run to completion remotely.

Steps I have done:

Written a pipeline with PipelineDecorator with one component and one pipeline function, executing_pipeline() .
In Clearml UI, created two queues - queue-services and queue-forecasting . - In the pipeline code, assigned the pipeline_execution_queue for the controller to queue-services
Assigned the other queue queue-forecasting to the pipeline component using execution_queue parameter.- Added a main.py which calls executing_pipeline() .
Built a docker image with the packages installed (as we use internal packages on Code Artifcact) and specified an entry point main.py .
On the ClearML server, started two agents (workers), both in --docker using the above docker image- One called worker-services in --services-mode and attached to queue queue-services
One called worker-forecasting and attached to queue queue-forecasting- Back in my IDE, I run python main.py and it starts and then switches to remote execution.
BUT, in the ClearML UI:
Under Pipelines, I can see the pipeline says running up until it gets to "Launching step [step_name]" and then just hangs there.
If I go to the Experiments tab I can see the task for this step just stays in "Pending" mode.
Under the Workers and queues tab, I can see the queue queue-forecasting with the worker assigned and under "Next experiment" is the step name. But, nothing happens.
Any tips or ideas?!

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Votes Newest

Answers 3

Hey @<1654294828365647872:profile|GorgeousShrimp11> can you abort all pending experiments that wait to be fetched from this queue and try again ? Off the top of my head it could be that the clearml-agent can’t pull the custom docker image. In general you should treat the docker images not as step definitions but only as the environment , hence setting the entrypoint is not necessary

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					EnthusiasticShrimp49
				
					0

Which gives me an idea. Could you please remove the entrypoint from the docker image altogether and try again ?

Overriding the entrypoint in the image can lead to docker run/docker exec failing to work properly , because instead of a shell it will use your entrypoint to run everything

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					EnthusiasticShrimp49
				
					0

Ok, thanks! Going to try this now. I included an entry point from reading some other messages on Slack here when trying to figure out how to use Docker for running remotely.

  				
Posted 
	8 months ago

					More
				  		
  Report
		
					GorgeousShrimp11
				
					0
					 × 1

Write your answer

512 Views

3 Answers

8 months ago