the issue was related to task.connect being called multiple times I guess.
This is odd?! how would that effect the crash?
Do notice that when you connect objects, each time you call connect you are basically deserializing the configuration from the backend into the code, maybe this somehow effected the object?
Any who I solved by commenting my code and created bunch of clone task and uncomment the code line by line, the issue was related to task.connect being called multiple times I guess.
Hi @<1570583227918192640:profile|FloppySwallow46> , can you please elaborate on what you were doing, what happened?
I cloned one of my task which was successfully completed, and I am trying to run it through the autoscaler by enqueing it. No idea why its been crashing again and again,
bash: line 1: 1031 Aborted (core dumped) NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id b7af4abc3c174a32a5612c5c27a7b523
2023-05-24 17:10:00
Process failed, exit code 134
it's odd to me as well, what's strange is that, the error was messaged was related to Nvidia and that wasn't the case as well. Anywho, its solved I am happy!!
I can see my process being crashed, I need to know why..if this is the clone of a completed task, then why it produces such error?
bash: line 1: 1031 Aborted (core dumped)
@<1570583227918192640:profile|FloppySwallow46> seems like the processes crashed,