Hello! I Was Hoping I Could Get Some Debug Help. I'Ve Set Up A Clearml Pipeline Using The Pipelinecontroller, And When Running Through

Answered

Hello! I was hoping I could get some debug help. I've set up a ClearML pipeline using the PipelineController, and when running through pipeline.start_locally() (but run_pipeline_steps_locally=False ) everything works perfectly with tasks being queued and run successfully. When I run remotely ( pipeline.start() ), I can see the pipeline task created and the environment set up within that task. It seems to proceed normally, and the output ends with
Environment setup completed successfully Starting Task Execution:However, it seems to be entirely hanging here in the "Running" state. All stages of the pipeline are "Pending", none of them are actually enqueued anywhere, and it seems like nothing is happening.

Does this sound like some obvious error that I'm doing / any advice for where to look to see what's going on? Using clearml==1.7.3rc1, clearml-agent==1.4.1 , and I have a worker listening to the services queue.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SteadySeagull18
				
					0
					 × 1

Votes Newest

Answers 6

Yup! Have two queues: services with one worker spun up in --services-mode , and another queue (say foo ) that has a bunch of GPU workers on them. When I start the pipeline locally, jobs get sent off to foo and executed exactly how I'd expect. If I keep everything exactly the same, and just change pipeline.start_locally() -> pipeline.start() , the pipeline task itself is picked up by the worker in the services queue, sets up the venv correctly, prints Starting Task Execution: then does nothing 😕

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SteadySeagull18
				
					0
					 × 1

Yup, code/git reference is there. Will private message you the log

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SteadySeagull18
				
					0
					 × 1

sets up the venv correctly, prints

Starting Task Execution:

then does nothing

Can you provide a log?
Do you see the code/git reference in the Pipeline Task details - Execution Tab ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It just seems frozen at the place where it should be spinning up the tasks within the pipeline

And is there an agent for those ? usually there is one agent for running logic tasks (like pipelines) running with --services-mode which means multiple Tasks can be executed by the same agent. And other agents for compute Tasks that are a signle Task per agent (but you can run multiple agents on the same machine)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yup, there was an agent listening to the services queue, it picked up the pipeline job and started to execute it. It just seems frozen at the place where it should be spinning up the tasks within the pipeline

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SteadySeagull18
				
					0
					 × 1

Hi SteadySeagull18

However, it seems to be entirely hanging here in the "Running" state.

Did you set a an agent to listen to the "services" queue ?
Someone needs to run the pipeline logic itself, it is sometimes part of the clearml-server deployment but not a mist

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

6 Answers

2 years ago