Do you see any change in the URL if you click on you "test" queue?
Digest: sha256:407714e5459e82157f7c64e95bf2d6ececa751cca983fdc94cb797d9adccbb2f Status: Downloaded newer image for nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
Not sure why my elasticsearch & mongodb crashed. I have to remove and recreate all the dockers. Then clearml-agent works fine too
I'm not sure, but I suspect it might be an issue... perhaps AgitatedDove14 knows?
Now my problem is clearml-agent pick up the job but fail to run the docker.
I am running on Window 10 Machine, is this not compatible?
Hi EnviousStarfish54 , did you use --foreground
? By default, the agent will output it's log to a log file, unless explicitly requested to do otherwise
Yes, i did use foreground.
I tested in a older "trains" server, it will show up log like this if no job is pick up. While my new "clearml-agent" shows nothing
No tasks in queue bb1bb1673f224fc98bbc8f86779be802
No tasks in Queues, sleeping for 5.0 seconds
First thing to make sure is that this is indeed your default queue's ID - perhaps the agent configuration is incorrect and the agent is connecting to a different server?
Well, go to the Workers and Queues section in the WebApp, click on Queues, than click on your default queue - the queue ID should appear in the URL
hmmmm, maybe I missed some UI Element, I can't locate any ID
Hi EnviousStarfish54
docker on windows , with nvidia runtime support is only with WSL (I think)
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-wip
https://medium.com/@dalgibbard/docker-with-gpu-support-in-wsl2-ebbc94251cf5
Sorry, let me get back to you tomorrow. Maybe I did something wrong now the entire UI crash