The first thing would be to monitor the apiserver service log and see the requests the server processes - we should identify the call to create the queue and surrounding calls might offer insight as to who requested it 🙂
Answered
Hi Folks, Occasionally When I Clone A Job And Enqueue It, Instead Of Being Processed By The Expected Queue, A New Queue (With Some Id That Looks Like An Hash) Is Created Instead, And The Experiment Hangs In A "Pending" State.
When This Happens, If I Abor
Hi folks, occasionally when I clone a job and enqueue it, instead of being processed by the expected queue, a new queue (with some id that looks like an hash) is created instead, and the experiment hangs in a "Pending" state.
When this happens, if I Abort the task, reset it and re-enqueue it, often things work. I couldn't properly understand when this happens, but I was wondering if any of you had the same experience?
I am using a self-hosted version of ClearML and the agents are spawned with the K8s Agent Glue helm chart.
Show more results
replies
4K Views
31
Answers
one year ago
one month ago
Tags