Unanswered
Hi Folks, Occasionally When I Clone A Job And Enqueue It, Instead Of Being Processed By The Expected Queue, A New Queue (With Some Id That Looks Like An Hash) Is Created Instead, And The Experiment Hangs In A "Pending" State.
When This Happens, If I Abor
I have tried this several time and the behaviour is always the same. It looks like when I modify some hyperparameter, when I enqueue the experiment to one queue, things don't work if I didn't make sure to have previously set the value of k8s-queue to the name of the queue that I want to use. If I don't modify the configuration (e.g. I abort, or reset the job and enqueue it again, or clone and enqueue it without modifying the hyperparameters) then everything works as expected.
149 Views
0
Answers
2 years ago
one year ago