I am running on Window 10 Machine, is this not compatible?
Hi EnviousStarfish54
docker on windows , with nvidia runtime support is only with WSL (I think)
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-wip
https://medium.com/@dalgibbard/docker-with-gpu-support-in-wsl2-ebbc94251cf5
Digest: sha256:407714e5459e82157f7c64e95bf2d6ececa751cca983fdc94cb797d9adccbb2f Status: Downloaded newer image for nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
Now my problem is clearml-agent pick up the job but fail to run the docker.
Hi EnviousStarfish54 , did you use --foreground
? By default, the agent will output it's log to a log file, unless explicitly requested to do otherwise
I'm not sure, but I suspect it might be an issue... perhaps AgitatedDove14 knows?
Do you see any change in the URL if you click on you "test" queue?
First thing to make sure is that this is indeed your default queue's ID - perhaps the agent configuration is incorrect and the agent is connecting to a different server?
Yes, i did use foreground.
I tested in a older "trains" server, it will show up log like this if no job is pick up. While my new "clearml-agent" shows nothing
No tasks in queue bb1bb1673f224fc98bbc8f86779be802
No tasks in Queues, sleeping for 5.0 seconds
Not sure why my elasticsearch & mongodb crashed. I have to remove and recreate all the dockers. Then clearml-agent works fine too
Sorry, let me get back to you tomorrow. Maybe I did something wrong now the entire UI crash
hmmmm, maybe I missed some UI Element, I can't locate any ID
Well, go to the Workers and Queues section in the WebApp, click on Queues, than click on your default queue - the queue ID should appear in the URL