Happened when cloning and running a task on an agent on a different machine. I
sounds like torch internal issue, can you send the full log of the remote Task ?
Check the log, the container has torch 1.13.0 but the task requires torch==1.13.1
Now torch package inside those nvidia prepackaged containers are compiled a bit differently . What I suspect happens is the torch wheel from pytorch is not compatible with this container . Easiest fix , change the task requirments to 1.13
Hi, I changed it to 1.13.0, but it still threw the same error. In the end I just changed to a bullseye container instead(since the nvidia container is not a must have), and it works now, but for some reason it doesnt auto detect all of my packages so I had to explicitly add them. But yeah, thanks for the help, I should have dug a bit deeper on my issue.
The full log
Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)
Hi, I changed it to 1.13.0, but it still threw the same error.
This is odd, just so we can make the agent better, any chance you can send the Task log ?
Jup, here it is.