Reputation
Badges 1
89 × Eureka!@<1523701070390366208:profile|CostlyOstrich36> I'm now running the agent with --docker
, and I'm using task.create(docker="nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04")
Setting agent.venvs_cache
path
back to ~/.clearml/venvs-cache
seems to have done the trick!
Resetting and enqueuing task which has built successfully also fails 😞
"Original PIP" is empty as for this task we can rely on the docker image to provide the python packages
DEBUG Installing build dependencies ... [?25l- \ | / - done
[?25h Getting requirements to build wheel ... [?25l- error
[1;31merror[0m: [1msubprocess-exited-with-error[0m
[31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m [31m[21 lines of output][0m
[31m [0m Traceback (most recent call last):
[31m [0m File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_i...
Thank you so much for your help @<1523701205467926528:profile|AgitatedDove14> !
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
is this what you had on the Original manual execution ?
Yes this installed packages list is what succeeded via manual submission to agent
Thank you for your help @<1523701205467926528:profile|AgitatedDove14>
Hey, yes I can see machine statistics on the experiments themselves
@<1523701070390366208:profile|CostlyOstrich36> thank you for your help in advance
Hi @<1523701205467926528:profile|AgitatedDove14>
ClearML Agent 1.9.0
agent.package_manager.pip_version=""
Try save_safetensors=False
in TrainingArguments
. Not sure if clearML supports safetensors
It was pointing to a network drive before to avoid the local directory filling up
Container nvcr.io/nvidia/pytorch:22.12-py3
WARNING:clearml_agent.helper.package.requirements:Local file not found [torch-tensorrt @ file:///opt/pytorch/torch_tensorrt/py/dist/torch_tensorrt-1.3.0a0-cp38-cp38-linux_x86_64.whl], references removed
Full log for the failed clone
Looks okay there
We are using allegroai/clearml:latest
API server
As I get a bunch of these warnings in both of the clones that failed
Thanks @<1523701205467926528:profile|AgitatedDove14> , will take a look
@<1523701205467926528:profile|AgitatedDove14> if we go with the ultralytics case:
INSTALLED PACKAGES for working manual execution
absl-py==2.1.0
albucore==0.0.13
albumentations==1.4.14
anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work
annotated-types==0.7.0
anyio==4.4.0
archspec @ file:///croot/archspec_1709217642129/work
astor==0.8.1
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
astunparse==1.6.3
attrs @ file:///croot/attrs_169571782329...
Maybe it's related to this section?
WARNING:clearml_agent.helper.package.requirements:Local file not found [anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work], references removed