@<1533620191232004096:profile|NuttyLobster9> it's a bit hard to say and the full log would be very helpful - can you perhaps remove all secrets and send it in a DM so it will not be public in the channel? I assume local paths etc. are less sensitive in a DM
Hi @<1523701205467926528:profile|AgitatedDove14> , sure. I just need to scrape them for any sensitive info then i'll post to this thread. Thanks for your reply.
Sure thing, anyhow we will fix this bug so next version there is no need for a workaround (but the workaround will still hold so you won't need to change anything)
@<1533620191232004096:profile|NuttyLobster9> I think we found the issue, when you are passing a direct link to the python venv, the agent fails to detect the python version and since the python version is required for fetching the correct torch it fails to install it. This is why passing CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE=none
because it skipped resolving the torch / cuda version (that requires parsing the python version)
Okay so I discovered that setting -e CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE=none
solves the issue.
That said, if someone could explain to me why this error was occurring and why it only happens in the case of cloning, I'd love to understand. Thanks!
Unfortunately, it's turning out to be quite time consuming to manually remove all of the private info in here. Is there a particular section of the log that would be useful to see? I can try to focus on just sharing that part.
Hi @<1533620191232004096:profile|NuttyLobster9>
First nice workaround!
Second could you send the full log? When the venv is skipped then pytorch resolving should be skipped as well, and no error should be raised...
And Lastly could you also send the log of the task that executed correctly (the one you cloned), because you are correct it should have been the same
Hi Martin, I see . That makes sense though I would have expected the behavior to be the same when running remotely the first time as well . In any case, this solved the issue for me . Thanks for looking at it