Reputation
Badges 1
13 × Eureka!@<1523701087100473344:profile|SuccessfulKoala55> Thanks, I'll check around that!
Hey @<1523701087100473344:profile|SuccessfulKoala55> , 1.8.1
Not yet, I tried making it work manually. Might give it a try, thanks!
is would definetly seem that way - Although when I look at the error logs, the failure is actually in creating the venv-build folder - (under the task_repository subfolder) -
Repository cloning failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\clearml-admin\\.clearml\\venvs-builds\\3.1\\task_repository\\<repo>.git'
Hi @<1523701087100473344:profile|SuccessfulKoala55> - Each worker uses its own venv-builds folder
but again - The system denies my deletion requiest since it deems the venv-builds dir as in use
Hey John,
thanks on 1!
regarding 2 - your detailing answered my question, thanks!
Hi  @<1523701070390366208:profile|CostlyOstrich36> ,
I tried setting up in the clearml-agent helm chart values requests & limits under the k8sGlue configuration in order to force the pods to pick up the gpu from the server, while of course choosing a pod image for the k8s jobs that includes a gpu in it (we're using nvidia/cuda:12.4.1 for testing)
the job is created - but simply can't detect a GPU. attaching the value overrides im using for the chart -
agentk8sglue:
          apiServer...
					huh. It is weird. is there any way to force deletion of it? it seems its still being held be some task and the server has been restarted several times since
Does this provide any more context? @<1523701087100473344:profile|SuccessfulKoala55>
Hey!
I see in my agent debug logs that it's constantly dropping the connection with the ClearML Server. I also see my tasks being aborted as User aborted (3) - Just at the point where the (post requirements) venv is added into the local venv cache. Could there be any connection? And if not, does anyone have any clue as to where to continue my debugging?
![image](https://clearml-web-assets.s3.am...