Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I'M Using The Autoscaler And Getting The Error

Hi, I'm using the autoscaler and getting the error Process terminated by user even though I did not terminate anything. This error occurs randomly during training (in other words training does successfully start). Does anyone know what may cause this error? Also, currently, the instances are deleted after the error, so we cant access their error logs. Is there a way to get access to these logs? ( WonderfulArcticwolf3 )

  
  
Posted 2 years ago
Votes Newest

Answers 7


Worker CLEARML-AGENT version 1.1.2
The autoscaler instance Clearml-AGENT version: 1.2.3
ClearML WebApp: 1.2.0-153 Server: 1.2.0-153 API: 2.16

  
  
Posted 2 years ago

CloudySwallow27 yes, this is what I wanted to know, can you try with the latest clearml version? pip install clearml==1.3.2 ?

  
  
Posted 2 years ago

TimelyPenguin76 not sure what you mean by "as a service or via the apps", but we are self-hosting it. Does that answer the question?

Also, not sure what you mean by which "clearml version". How do we check this? The clearml python package is 1.1.4. Is that what you wanted?

  
  
Posted 2 years ago

WonderfulArcticwolf3 and CloudySwallow27 are you running it as a service or via the apps? whats the clearml version (not agent)?

  
  
Posted 2 years ago

this worked, thank you!

  
  
Posted 2 years ago

We were able to find an error from the autoscalaer agent:

Stuck spun instance dynamic_worker:clearml-agent-autoscale:p2.xlarge:i-015001a93e0910a09 of type clearml-agent-autoscale

2022-04-19 19:16:58,339 - clearml.auto_scaler - INFO - Spinning down stuck worker: 'dynamic_worker:clearml-agent-autoscale:p2.xlarge:i-015001a93e0910a09

  
  
Posted 2 years ago

Hi CloudySwallow27

This error occurs randomly during training (in other words training does successfully start).

What's the cleamrl-agent version you are using, and the clearml version ?

  
  
Posted 2 years ago