
Reputation
Badges 1
4 × Eureka!Our jobs are now running on the online app š
Thank you
Unfortunately, the issue is only partially resolved: while some jobs are running on one instance, on another instance (default_gpu), our jobs are still pending⦠š¢
Hello
Sorry for my late reply.
Iām running into an issue with my default_gpu queue: the ClearML auto-scaler detects the job and puts it into the āPendingā state, but it never actually runs. From the auto-scaler logs (see screenshot 1), this seems expected since it only checks the queue every 5 minutes. Iāve also attached the relevant log file.
However, I donāt see anything in the logs that clearly explains the problem. Looking at AWS, I can see that the instance starts, stays in āInitializi...
I do not see any artifacts linked to the jobs in the default_gpu queue. We have not changed the configuration; as a debugging step, we simply restarted the instance.