Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Dear Community, My Name Is Christoph And We Try To Use Clearml Free Tier With Agents. However, We Have The Problem That The Agent Gets Stuck On Execution (V1.8.1) - No Matter If Using Virtualenv Or Docker As Virtualization, And Aarch Or Amd64 Architec

Hi dear community,
my name is Christoph and we try to use ClearML free tier with agents.

However, we have the problem that the agent gets stuck on execution (v1.8.1) - no matter if using virtualenv or Docker as virtualization, and aarch or amd64 architectures.
And no matter whether using PipelineDecorator or PipelineController.

It starts the pipeline, logs that the first step is started, and then...does nothing anymore. I use the examples given by ClearML itself. They all seem fine.

Any ideas?

Thank you very much!

  
  
Posted 3 months ago
Votes Newest

Answers 11


It works. However, still, it sometimes takes a strangely long time for the agent to pick up the next task (or process it), even if it is only "Hello World".

The agent check every 2/5 seconds if there is a new Task to be launched, could that be it?

  
  
Posted 3 months ago

Just noting that it also does not work with two agents listening to the same queue, because I thought maybe the controller task of the pipeline blocks the executing of the actual tasks.

  
  
Posted 3 months ago

This is true, yes. I do

pipe.set_default_execution_queue("default") and also
pipe.start(queue="default"), where the single steps do not specify queues. Also, my GUI tells me that this is so.

  
  
Posted 3 months ago

Yes, you are right, thanks. Now, I am using two agents with one using a queue dedicated only to the pipeline, and one dedicated to the single tasks. It works. However, still, it sometimes takes a strangely long time for the agent to pick up the next task (or process it), even if it is only "Hello World".

  
  
Posted 3 months ago

Update:

  • It does seem to work somehow sometimes, but it takes an unreasonably long time. Even just printing print("Hello World") takes like a minute or so (after the environment has fully been set up).
  • I needed to trigger the pipeline 2 times, the first time not even the pipeline started.
  
  
Posted 3 months ago

Well, rather, it takes a minute to complete.

  
  
Posted 3 months ago

@<1724960468822396928:profile|CumbersomeSealion22> in the pipeline definition, I assume you use the same queue to enqueue the controller and the steps?

  
  
Posted 3 months ago

I have one agent running on the machine. I also have only one task running. This only happens to us when we use pipelines, not single tasks. It does not depend on parameters like cache. There are no other tasks running in the meantime. I can boil it down even to "Hello World" tasks.

Notably, the example given here

None

also causes the observed behavior.

  
  
Posted 3 months ago

Hi @<1724960468822396928:profile|CumbersomeSealion22>

It starts the pipeline, logs that the first step is started, and then...does nothing anymore.

How many agents do you have running? by default an agent will run a Task per agent (unless executed with --services-mode which would allow it to run unlimited amount of parallel tasks)

  
  
Posted 3 months ago

Container environment setup overhead?

  
  
Posted 3 months ago

I have one agent running on the machine. I also have only one task running. This

only

happens to us when we use pipelines

@<1724960468822396928:profile|CumbersomeSealion22> notice that when you are launching a pipeline you are actually running Two tasks, one is the "pipeline" itself (i.e. the logic) and one is the component in the pipeline (i.e. the step)
If you have one agent, I'm assuming what happens is the pipeline itself (the one that you launch on your machine) is stopping and being relaunched on the agent, then it is launching the step itself that is waiting in the same queue to be executed but there is no free agent to pull and execute it.
If you want to test this theory, run the pipeline logic "locally" (i.e. no agent) by doing:

pipe.start_locally(run_pipeline_steps_locally=False)   
  
  
Posted 3 months ago
357 Views
11 Answers
3 months ago
3 months ago
Tags