Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone, I'M Experiencing An Issue With Clearml Running On K8S. After Upgrading The Clearml Server Helm Chart From Version 7.11.5, I'M Seeing The Following Errors: In The Agent:

Hi everyone,
I'm experiencing an issue with ClearML running on K8S. After upgrading the ClearML server helm chart from version 7.11.5, I'm seeing the following errors:

In the agent:

[2025-02-16 17:26:45,889] [9] [WARNING] [clearml.service_repo] Returned 400 for tasks.enqueue in 4ms, msg=Validation error (Cannot skip setting execution queue for a task that is not enqueued or does not have execution queue set)

In the clearml-server-api-pod:

[2025-02-16 19:13:45,658] [9] [WARNING] [clearml.service_repo] Returned 400 for queues.remove_task in 3ms, msg=Invalid queue id or task not in queue: task=37c6ce31c53d449994f7c9096c26d6f7, id=99d6dd77e67a4f12bb5de901596e0e1e, company=d1bd92a3b039400cbafc60a7a5b1e52b

[2025-02-16 19:13:45,669] [9] [WARNING] [clearml.service_repo] Returned 400 for tasks.enqueue in 3ms, msg=Validation error (Cannot skip setting execution queue for a task that is not enqueued or does not have execution queue set)

I've tried several versions, including the latest 7.14.2, but the error persists. For testing, I'm using a simple pipeline:

import clearml
from clearml import PipelineController

pipe = PipelineController(
    name='simple-pipeline',
    project='hello-world-project',
    version='1.0.0',
)

pipe.set_default_execution_queue('default')

def say_hello():
    print("Hello World!")
    return {"message": "Hello World!"}

pipe.add_function_step(
    name='hello-step',
    function=say_hello,
    function_return=['hello_result']
)

pipe.start(queue='default')

I believe the issue lies with the clearml-apiserver. When I downgrade the clearml-apiserver image in the helm chart back to version 1.16.2-502, the agent successfully picks up the job.

Additional information:

  • My Kubernetes version is 29.2.10
  • I've reproduced this issue on other K8s versions
  • The problem persists even when using the default values.yaml
  
  
Posted one month ago
Votes Newest

Answers 8


Might be this None

  
  
Posted one month ago

Will do

  
  
Posted one month ago

Hi WorriedSwan6

On a different issue, have you any solution on how to make the agent listen to multiply queues?

each agent is connected with one type of queue that represents the Job that agent will create. You can connect to it multiple queues, and it will pull from creating the same "type" of job regardless of where it's coming from. If you want another job to be created, just spin another agent, there is no limit to the number of agents you can spin in the cluster (they do not actually require a lot of resources, they sleep most of the time 🙂 )
Is this what you had in mind?

  
  
Posted one month ago

Hey WobblyFrog79 , yes testing this locally it does seems to solve the issue, thank you.
I will test it in our env.

On a different issue, have you any solution on how to make the agent listen to multiply queues?
On the helm it is written :

  # -- ClearML queue this agent will consume. Multiple queues can be specified with the following format: queue1,queue2,queue3

But this does not work as the agent will read them all as one queue

  
  
Posted one month ago

This hasn’t worked for me either, I use multiple queues instead. Another reason I also use multiple queues is because I need to specify different resource requirements for pods launched by each queue (CPU-only vs GPU).

  
  
Posted one month ago

Hey Martin, do you know how to connect the agent to multiply queues?

  
  
Posted one month ago

AgitatedDove14 for me it hasn’t worked when I specified agentk8sglue.queue: "queue1,queue2" in the Helm chart options which should be possible according to documentation. What also hasn’t worked is that flag for creating a queue if it doesn’t exists ( agentk8sglue.createQueueIfNotExists ). Both failed parsing at runtime, so those are 2 bugs I’d say.

  
  
Posted one month ago

hmm, yes it should create the queue if it's missing (btw you could work around that and create it in the UI). Any chance you can open a github issue in the clearml helm chart repo so we do not forget ?

  
  
Posted one month ago
166 Views
8 Answers
one month ago
one month ago
Tags