ClearML FAQ | Hi Dear Community, My Name Is Christoph And We Try To Use Clearml Free Tier With Agents. However, We Have The Problem That The Agent Gets Stuck On Execution (V1.8.1) - No Matter If Using Virtualenv Or Docker As Virtualization, And Aarch Or Amd64 Architec

Answered

Hi Dear Community, My Name Is Christoph And We Try To Use Clearml Free Tier With Agents. However, We Have The Problem That The Agent Gets Stuck On Execution (V1.8.1) - No Matter If Using Virtualenv Or Docker As Virtualization, And Aarch Or Amd64 Architec

Hi dear community,
my name is Christoph and we try to use ClearML free tier with agents.

However, we have the problem that the agent gets stuck on execution (v1.8.1) - no matter if using virtualenv or Docker as virtualization, and aarch or amd64 architectures.
And no matter whether using PipelineDecorator or PipelineController.

It starts the pipeline, logs that the first step is started, and then...does nothing anymore. I use the examples given by ClearML itself. They all seem fine.

Any ideas?

Thank you very much!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CumbersomeSealion22
				
					0
					 × 1

Votes Newest

Answers 11

I have one agent running on the machine. I also have only one task running. This only happens to us when we use pipelines, not single tasks. It does not depend on parameters like cache. There are no other tasks running in the meantime. I can boil it down even to "Hello World" tasks.

Notably, the example given here

None

also causes the observed behavior.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CumbersomeSealion22
				
					0
					 × 1

Update:

It does seem to work somehow sometimes, but it takes an unreasonably long time. Even just printing print("Hello World") takes like a minute or so (after the environment has fully been set up).
I needed to trigger the pipeline 2 times, the first time not even the pipeline started.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CumbersomeSealion22
				
					0
					 × 1

@<1724960468822396928:profile|CumbersomeSealion22> in the pipeline definition, I assume you use the same queue to enqueue the controller and the steps?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Well, rather, it takes a minute to complete.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CumbersomeSealion22
				
					0
					 × 1

Container environment setup overhead?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes, you are right, thanks. Now, I am using two agents with one using a queue dedicated only to the pipeline, and one dedicated to the single tasks. It works. However, still, it sometimes takes a strangely long time for the agent to pick up the next task (or process it), even if it is only "Hello World".

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CumbersomeSealion22
				
					0
					 × 1

Just noting that it also does not work with two agents listening to the same queue, because I thought maybe the controller task of the pipeline blocks the executing of the actual tasks.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CumbersomeSealion22
				
					0
					 × 1

This is true, yes. I do

pipe.set_default_execution_queue("default") and also
pipe.start(queue="default"), where the single steps do not specify queues. Also, my GUI tells me that this is so.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CumbersomeSealion22
				
					0
					 × 1

It works. However, still, it sometimes takes a strangely long time for the agent to pick up the next task (or process it), even if it is only "Hello World".

The agent check every 2/5 seconds if there is a new Task to be launched, could that be it?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1724960468822396928:profile|CumbersomeSealion22>

It starts the pipeline, logs that the first step is started, and then...does nothing anymore.

How many agents do you have running? by default an agent will run a Task per agent (unless executed with --services-mode which would allow it to run unlimited amount of parallel tasks)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I have one agent running on the machine. I also have only one task running. This

only

happens to us when we use pipelines

@<1724960468822396928:profile|CumbersomeSealion22> notice that when you are launching a pipeline you are actually running Two tasks, one is the "pipeline" itself (i.e. the logic) and one is the component in the pipeline (i.e. the step)
If you have one agent, I'm assuming what happens is the pipeline itself (the one that you launch on your machine) is stopping and being relaunched on the agent, then it is launching the step itself that is waiting in the same queue to be executed but there is no free agent to pull and execute it.
If you want to test this theory, run the pipeline logic "locally" (i.e. no agent) by doing:

pipe.start_locally(run_pipeline_steps_locally=False)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

11 Answers

one year ago