Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hello. I Have An Issue In Regards To A Task That I Run As A Service ( Should Always Run). I Run The Clearml Server And Agents In Kubernetes. I Think This Is A Design Problem With The Way Clearml Agents Run On Kubernetes. The K8S Glue Will Launch A Worker


This means that if something happens with the k8s node the pod runs on,

Actually if the pod crashed (the pod not the Task) k8s should re spin it, no?

I also experience that if a worker pod running a task is terminated, clearml does not fail/abort the task.

From the k8s perspective, if the task ended (failed/completed) it always return with exit code 0, i.e. success. Because the agent was able to spin the Task. We do not want Tasks with exception to litter the k8s with endless retries ...
Does that make sense ?

Now for example the pod was killed because I had to replace the node. The task is stuck in "Running".

There is a "timer" in the backend (default 2 hours) that if a Task is marked running but does not "ping" (i.e. a live) i will set it to aborted.

  
  
Posted one year ago
134 Views
0 Answers
one year ago
one year ago
Tags