Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, What Is The Best Way To Monitor Failer Clearml Agent That Kill All Tasks In Queue?

Hi all,
what is the best way to monitor failer clearml agent that kill all tasks in queue?

  
  
Posted one year ago
Votes Newest

Answers 4


Hi @<1539780272512307200:profile|GaudyPig83> , I'm not sure I understand - what do you mean by failed clearml agent?

  
  
Posted one year ago

I think you should monitor your tasks and see what's going on. Also an agent should be set up in a way that you know it will work and has all the required drivers etc..

  
  
Posted one year ago

The thing is the agent does not fail - it's the task setup that fails... One approach is to monitor all tasks handled by that agent (although I'm not sure what will be the rule by which you decide). Another is to periodically send "test" tasks that are very short and test a specific (or all) setup pre-requisites, and monitor their status

  
  
Posted one year ago

Hi, for example there ia mechine without "nvidia driver" on "yotam-mechine" ,
And "yotam mechine" is on queue "a".
There is 200 tasks on this queue.
So "yotam -mechine" will start task,and will failed.
And will get the next task and also will failed.
And will kill all the tasks in the queue.

  
  
Posted one year ago
873 Views
4 Answers
one year ago
one year ago
Tags