Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
GentleParrot65
Moderator
1 Question, 5 Answers
  Active since 22 May 2023
  Last activity 10 months ago

Reputation

0

Badges 1

2 × Eureka!
0 Votes
3 Answers
514 Views
0 Votes 3 Answers 514 Views
10 months ago
0 Hello Everyone, I’M Currently Facing An Issue While Using Cloud Clearml With Aws_Autoscaler.Py. Occasionally, Some Workers Become Unusable When An Ec2 Instance Is Terminated, Either Manually Or By Aws_Autoscaler.Py. These Workers Are Displayed In The Ui W

Yes. I’ve done some debugging and discovered that process started from user-data script doesn’t receive SIGTERM on instance termination. So worker is unable to gracefully shutdown and unregister.

10 months ago
0 Hello Everyone, I’M Currently Facing An Issue While Using Cloud Clearml With Aws_Autoscaler.Py. Occasionally, Some Workers Become Unusable When An Ec2 Instance Is Terminated, Either Manually Or By Aws_Autoscaler.Py. These Workers Are Displayed In The Ui W

More investigation showed, that there is a problem with cloud init. When I connect via ssh and start process with “nohup python … & “, everything works, process receives SIGTERM on instance termination. Process started with could init (user data script) receives no signals on instance termination (but it receives signals send with kill <pid>). I’ve tried following:

  • start with nohup python3 -m clearml-agent … &
  • start agent with --detached flag. Nothing works. So it looks like a bug.
10 months ago
0 Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

Thank you, for your answer.
aws_autoscaler.py works as follows (based on my experiments):

  • let’s assume that the instance and the worker is started
  • there are no tasks running on the worker for max_idle_time_min
  • autoscaler terminates the instance
  • worker stops sending updates to app.clear.ml
  • worker is still shown on the ui with message “Update Time a few minutes ago”
  • autoscaler thinks that this worker is still idle because it’s returned via workers.get_all
  • when I enqueue task in t...
11 months ago