Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I'Ve Noticed That In A Few Experiments That Got Stuck With No Indication That Something Went Wrong And It Kept On "Running" Until I Manually Aborted The Experiments, Is There A Way To Create A Timeout For If The Server Is Stuck? For Clarification I'M

Hi,
I've noticed that in a few experiments that got stuck with no indication that something went wrong and it kept on "running" until i manually aborted the experiments, is there a way to create a timeout for if the server is stuck?
For clarification I'm using the clearml server.

  
  
Posted 5 months ago
Votes Newest

Answers 8


Hi @<1523701295830011904:profile|CluelessFlamingo93> part of the server is a service that kills such tasks, I think this is what you're looking for - None

  
  
Posted 5 months ago

Hi @<1523701070390366208:profile|CostlyOstrich36> ,
but how do I configure this if I'm not hosting the clearml server?
where can i find the services.conf file?

  
  
Posted 5 months ago

we had a few experiments that were stuck for a few hours until we noticed that and we also had 1 that was stuck for 2 days (on the weekend). and they weren't auto aborted.

  
  
Posted 5 months ago

Oh, I misunderstood. You mean you're using app.clear.ml ?

  
  
Posted 5 months ago

Can you provide a task id for such a task?

  
  
Posted 5 months ago

yes that's exactly my issue.

  
  
Posted 5 months ago

sadly the teammate that had the problem re-ran the experiments so i don't have the taskids but I do have the cpu and gpu usage of the agent that ran the experiment:
image

  
  
Posted 5 months ago

Then these should be by default killed by the ClearML server after a few hours. How long was it stuck?

  
  
Posted 5 months ago
509 Views
8 Answers
5 months ago
5 months ago
Tags