now, I go to experiment, clone an experiment that I previously executed on my laptop. In the newly created experiment, I modify some parameter, and enqueue the experiment in the CPU queue.
Answered
Hi Folks, Occasionally When I Clone A Job And Enqueue It, Instead Of Being Processed By The Expected Queue, A New Queue (With Some Id That Looks Like An Hash) Is Created Instead, And The Experiment Hangs In A "Pending" State.
When This Happens, If I Abor
Hi folks, occasionally when I clone a job and enqueue it, instead of being processed by the expected queue, a new queue (with some id that looks like an hash) is created instead, and the experiment hangs in a "Pending" state.
When this happens, if I Abort the task, reset it and re-enqueue it, often things work. I couldn't properly understand when this happens, but I was wondering if any of you had the same experience?
I am using a self-hosted version of ClearML and the agents are spawned with the K8s Agent Glue helm chart.
Show more results
replies
22K Views
31
Answers
2 years ago
7 months ago
Tags