Reputation
Badges 1
13 × Eureka!Does this provide any more context? @<1523701087100473344:profile|SuccessfulKoala55>
huh. It is weird. is there any way to force deletion of it? it seems its still being held be some task and the server has been restarted several times since
Not yet, I tried making it work manually. Might give it a try, thanks!
Hey John,
thanks on 1!
regarding 2 - your detailing answered my question, thanks!
Hey @<1523701087100473344:profile|SuccessfulKoala55> , 1.8.1
Hi @<1523701070390366208:profile|CostlyOstrich36> ,
I tried setting up in the clearml-agent helm chart values requests & limits under the k8sGlue configuration in order to force the pods to pick up the gpu from the server, while of course choosing a pod image for the k8s jobs that includes a gpu in it (we're using nvidia/cuda:12.4.1 for testing)
the job is created - but simply can't detect a GPU. attaching the value overrides im using for the chart -
agentk8sglue:
apiServer...
@<1523701087100473344:profile|SuccessfulKoala55> Thanks, I'll check around that!
Hi @<1523701087100473344:profile|SuccessfulKoala55> - Each worker uses its own venv-builds folder
is would definetly seem that way - Although when I look at the error logs, the failure is actually in creating the venv-build folder - (under the task_repository subfolder) -
Repository cloning failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\clearml-admin\\.clearml\\venvs-builds\\3.1\\task_repository\\<repo>.git'
Hey!
I see in my agent debug logs that it's constantly dropping the connection with the ClearML Server. I also see my tasks being aborted as User aborted (3) - Just at the point where the (post requirements) venv is added into the local venv cache. Could there be any connection? And if not, does anyone have any clue as to where to continue my debugging?
![image](https://clearml-web-assets.s3.am...
but again - The system denies my deletion requiest since it deems the venv-builds dir as in use