Hi @<1523701842515595264:profile|PleasantOwl46> , what do you mean more details about the state? Usually in the INFO section of the task you have all history of actions
to my understating:failed
means that python job exited non gracefully, with errors originated from python
what I miss is how to refer to aborted
vs. stopped
does the user initiated the job to stop?
or it's something came from the system running the job?
I did note STATUS MESSAGE:
and STATUS REASON:
it N/A in many cases, some get Singal None
value, or Forced stop (non-responsive)
, but not sure how to refer these fields and what can I learn from them
hey there @<1523701070390366208:profile|CostlyOstrich36>
any chance I get more input on this? anywhere to look in the docs?
I hope you understood what am I looking for
@<1876800977114238976:profile|ShakyCrocodile77> perhaps can elaborate
e.g
what is diff between jobs stopped vs. aborted?
it's sig kill sent from user?
how does clearml set if job is stopped or aborted?