Unanswered
Hello Everyone, I’M Currently Facing An Issue While Using Cloud Clearml With Aws_Autoscaler.Py. Occasionally, Some Workers Become Unusable When An Ec2 Instance Is Terminated, Either Manually Or By Aws_Autoscaler.Py. These Workers Are Displayed In The Ui W
More investigation showed, that there is a problem with cloud init. When I connect via ssh and start process with “nohup python … & “, everything works, process receives SIGTERM on instance termination. Process started with could init (user data script) receives no signals on instance termination (but it receives signals send with kill <pid>). I’ve tried following:
- start with nohup python3 -m clearml-agent … &
- start agent with --detached flag. Nothing works. So it looks like a bug.
150 Views
0
Answers
one year ago
one year ago