
Reputation
Badges 1
89 × Eureka!Although that's not ideal as it turns off CPU parallelisation
Code to enqueue
from clearml import Task
task = Task.create(
script="script.py",
docker="ultralytics/ultralytics:latest",
docker_args=["--network=host", "--ipc=host", "--shm_size=55G"],
)
task.enqueue(task, "default")
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
@<1523701070390366208:profile|CostlyOstrich36> I don't think it's related to disk, I think it's related to shm
How to replicate on ClearML:
task = Task.create(
script="myscript.py",
packages=["opencv-python==4.6.*", "ultralytics"],
docker="nvcr.io/nvidia/pytorch:22.12-py3",
)
Contents of myscript.py:from ultralytics import YOLO
@<1717350332247314432:profile|WittySeal70> what's strange is I can import the package in the docker container when I run it outside of clearML
I think it might be related to the new run overwriting in this location
What does ClearML do differently that leads to a failure here?
Solved that by setting docker_args=["--privileged", "--network=host"]
It did work on clearml on prem with docker_args=["--network=host", "--ipc=host"]
We are using allegroai/clearml:latest
API server
@<1523701070390366208:profile|CostlyOstrich36> thank you for your help in advance
Hey yes it's self deployed
But that doesn't explain why the model JSON files are missing.
@<1523701070390366208:profile|CostlyOstrich36> do you have any ideas? Thank you
Setting agent.venvs_cache
path
back to ~/.clearml/venvs-cache
seems to have done the trick!
Full log for the failed clone
WARNING:clearml_agent.helper.package.requirements:Local file not found [torch-tensorrt @ file:///opt/pytorch/torch_tensorrt/py/dist/torch_tensorrt-1.3.0a0-cp38-cp38-linux_x86_64.whl], references removed
"Original PIP" is empty as for this task we can rely on the docker image to provide the python packages
is this what you had on the Original manual execution ?
Yes this installed packages list is what succeeded via manual submission to agent
It was pointing to a network drive before to avoid the local directory filling up
agent.package_manager.pip_version=""
Resetting and enqueuing task which has built successfully also fails 😞
Hi @<1523701205467926528:profile|AgitatedDove14>
ClearML Agent 1.9.0
@<1523701205467926528:profile|AgitatedDove14> if we go with the ultralytics case:
INSTALLED PACKAGES for working manual execution
absl-py==2.1.0
albucore==0.0.13
albumentations==1.4.14
anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work
annotated-types==0.7.0
anyio==4.4.0
archspec @ file:///croot/archspec_1709217642129/work
astor==0.8.1
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
astunparse==1.6.3
attrs @ file:///croot/attrs_169571782329...
In a cloned run with new container ultralytics/ultralytics:latest
I get this error:
clearml_agent: ERROR: Could not install task requirements!
Command '['/root/.clearml/venvs-builds/3.10/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqs7171xfem.txt', '--extra-index-url', '
', '--extra-index-url', '
returned non-zero exit status 1.