
Reputation
Badges 1
13 × Eureka!Hey John,
thanks on 1!
regarding 2 - your detailing answered my question, thanks!
Not yet, I tried making it work manually. Might give it a try, thanks!
Hi @<1523701070390366208:profile|CostlyOstrich36> ,
I tried setting up in the clearml-agent helm chart values requests & limits under the k8sGlue configuration in order to force the pods to pick up the gpu from the server, while of course choosing a pod image for the k8s jobs that includes a gpu in it (we're using nvidia/cuda:12.4.1 for testing)
the job is created - but simply can't detect a GPU. attaching the value overrides im using for the chart -
agentk8sglue:
apiServer...
Does this provide any more context? @<1523701087100473344:profile|SuccessfulKoala55>
but again - The system denies my deletion requiest since it deems the venv-builds dir as in use
@<1523701087100473344:profile|SuccessfulKoala55> Thanks, I'll check around that!
Hi @<1523701087100473344:profile|SuccessfulKoala55> - Each worker uses its own venv-builds folder
Hey @<1523701087100473344:profile|SuccessfulKoala55> , 1.8.1
Hey!
I see in my agent debug logs that it's constantly dropping the connection with the ClearML Server. I also see my tasks being aborted as User aborted (3) - Just at the point where the (post requirements) venv is added into the local venv cache. Could there be any connection? And if not, does anyone have any clue as to where to continue my debugging?
 -
Repository cloning failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\clearml-admin\\.clearml\\venvs-builds\\3.1\\task_repository\\<repo>.git'