Resetting and enqueuing task which has built successfully also fails 😞
@<1523701205467926528:profile|AgitatedDove14> if we go with the ultralytics case:
INSTALLED PACKAGES for working manual execution
absl-py==2.1.0
albucore==0.0.13
albumentations==1.4.14
anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work
annotated-types==0.7.0
anyio==4.4.0
archspec @ file:///croot/archspec_1709217642129/work
astor==0.8.1
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
astunparse==1.6.3
attrs @ file:///croot/attrs_1695717823297/work
Automat==24.8.1
beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work
boltons @ file:///croot/boltons_1677628692245/work
Brotli @ file:///croot/brotli-split_1714483155106/work
cattrs==23.2.3
certifi @ file:///croot/certifi_1707229174982/work/certifi
cffi @ file:///croot/cffi_1714483155441/work
chardet @ file:///home/builder/ci_310/chardet_1640804867535/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
chex==0.1.86
clearml==1.16.4
click @ file:///croot/click_1698129812380/work
coloredlogs==15.0.1
Comet==3.1.0
conda @ file:///croot/conda_1689269889729/work
conda-build @ file:///croot/conda-build_1710789183177/work
conda-content-trust @ file:///croot/conda-content-trust_1714483159009/work
conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1691418897561/work/src
conda-package-handling @ file:///croot/conda-package-handling_1714483155348/work
conda_index @ file:///croot/conda-index_1706633791028/work
conda_package_streaming @ file:///croot/conda-package-streaming_1690987966409/work
constantly==23.10.4
contourpy==1.3.0
coremltools==7.2
cryptography @ file:///croot/cryptography_1714660666131/work
cycler==0.12.1
Cython==3.0.11
decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
distlib==0.3.8
distro @ file:///croot/distro_1714488253808/work
dnspython==2.6.1
etils==1.7.0
eval_type_backport==0.2.0
exceptiongroup @ file:///croot/exceptiongroup_1706031385326/work
executing @ file:///opt/conda/conda-bld/executing_1646925071911/work
expecttest==0.2.1
filelock @ file:///croot/filelock_1700591183607/work
flatbuffers==24.3.25
flax==0.9.0
fonttools==4.53.1
frozendict @ file:///croot/frozendict_1713194832637/work
fsspec==2024.6.0
furl==2.1.3
gast==0.6.0
gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645455533097/work
google-pasta==0.2.0
grpcio==1.66.0
h11==0.14.0
h5py==3.11.0
httpcore==1.0.5
httpx==0.27.2
humanfriendly==10.0
humanize==4.10.0
hyperlink==21.0.0
hypothesis==6.103.0
idna @ file:///croot/idna_1714398848350/work
imageio==2.35.1
importlib_resources==6.4.4
incremental==24.7.2
ipython @ file:///croot/ipython_1704833016303/work
jax==0.4.31
jaxlib==0.4.31
jedi @ file:///tmp/build/80754af9/jedi_1644315229345/work
Jinja2 @ file:///croot/jinja2_1716993405101/work
jsonpatch @ file:///croot/jsonpatch_1714483231291/work
jsonpointer==2.1
jsonschema @ file:///croot/jsonschema_1699041609003/work
jsonschema-specifications @ file:///croot/jsonschema-specifications_1699032386549/work
keras==3.5.0
kiwisolver==1.4.5
lark==1.1.9
lazy_loader==0.4
libarchive-c @ file:///tmp/build/80754af9/python-libarchive-c_1617780486945/work
libclang==18.1.1
libmambapy @ file:///croot/mamba-split_1714483352891/work/libmambapy
lxml==5.3.0
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe @ file:///croot/markupsafe_1704205993651/work
matplotlib==3.9.2
matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work
mdurl==0.1.2
menuinst @ file:///croot/menuinst_1716404372721/work
mkl-fft @ file:///croot/mkl_fft_1695058164594/work
mkl-random @ file:///croot/mkl_random_1695059800811/work
mkl-service==2.4.0
ml-dtypes==0.4.0
more-itertools @ file:///croot/more-itertools_1700662129964/work
mpmath @ file:///croot/mpmath_1690848262763/work
msgpack==1.0.8
namex==0.0.8
ncnn==1.0.20240820
nest-asyncio==1.6.0
networkx @ file:///croot/networkx_1717597493534/work
numpy==1.23.5
nvidia-cuda-runtime-cu12==12.6.37
onnx==1.16.2
onnx-graphsurgeon==0.5.2
onnx2tf==1.22.3
onnxruntime==1.19.0
onnxslim==0.1.32
opencv-python==4.10.0.84
opencv-python-headless==4.10.0.84
openvino==2024.3.0
openvino-telemetry==2024.1.0
opt-einsum==3.3.0
optax==0.2.3
optree==0.11.0
orbax-checkpoint==0.6.1
orderedmultidict==1.0.1
packaging @ file:///croot/packaging_1710807400464/work
paddlepaddle==2.6.1
pandas==2.2.2
parso @ file:///opt/conda/conda-bld/parso_1641458642106/work
pathlib2==2.3.7.post1
pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work
pillow @ file:///croot/pillow_1714398848491/work
pkginfo @ file:///croot/pkginfo_1715695984887/work
platformdirs @ file:///croot/platformdirs_1692205439124/work
pluggy @ file:///tmp/build/80754af9/pluggy_1648024709248/work
portalocker==2.10.1
prompt-toolkit @ file:///croot/prompt-toolkit_1704404351921/work
protobuf==3.20.3
psutil @ file:///opt/conda/conda-bld/psutil_1656431268089/work
ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work
py-cpuinfo==9.0.0
pyaml==24.7.0
pybind11==2.13.5
pycocotools==2.0.8
pycosat @ file:///croot/pycosat_1714510623388/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic==2.8.2
pydantic_core==2.20.1
Pygments @ file:///croot/pygments_1684279966437/work
PyJWT==2.8.0
pyOpenSSL @ file:///croot/pyopenssl_1708380408460/work
pyparsing==3.1.4
PySocks @ file:///home/builder/ci_310/pysocks_1640793678128/work
python-dateutil==2.8.2
python-etcd==0.4.5
pytz @ file:///croot/pytz_1713974312559/work
PyYAML @ file:///croot/pyyaml_1698096049011/work
referencing @ file:///croot/referencing_1699012038513/work
requests==2.31.0
rich==13.8.0
rpds-py @ file:///croot/rpds-py_1698945930462/work
ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work
ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work
scikit-image==0.24.0
scipy==1.14.1
seaborn==0.13.2
six @ file:///tmp/build/80754af9/six_1644875935023/work
sng4onnx==1.0.4
sniffio==1.3.1
sortedcontainers==2.4.0
sounddevice==0.5.0
soupsieve @ file:///croot/soupsieve_1696347547217/work
stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work
sympy==1.12.1
tensorboard==2.17.1
tensorboard-data-server==0.7.2
tensorflow==2.17.0
tensorflow-hub==0.16.1
tensorflow-io-gcs-filesystem==0.37.1
tensorflow_decision_forests==1.10.0
tensorflowjs==4.20.0
tensorrt-cu12==10.1.0
tensorrt-cu12-bindings==10.1.0
tensorrt-cu12-libs==10.1.0
tensorstore==0.1.64
termcolor==2.4.0
tf_keras==2.17.0
tflite-support==0.4.4
tifffile==2024.8.28
tomli @ file:///opt/conda/conda-bld/tomli_1657175507142/work
toolz @ file:///croot/toolz_1667464077321/work
torch==2.3.1
torchaudio==2.3.1
torchelastic==0.2.2
torchvision==0.18.1
tqdm @ file:///croot/tqdm_1716395931952/work
traitlets @ file:///croot/traitlets_1671143879854/work
triton==2.3.1
truststore @ file:///croot/truststore_1695244293384/work
Twisted==24.7.0
types-dataclasses==0.6.6
typing_extensions @ file:///croot/typing_extensions_1715268824938/work
tzdata==2024.1
-e git+
ultralytics-thop==2.0.5
urllib3==1.26.19
virtualenv==20.26.3
wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work
Werkzeug==3.0.4
wrapt==1.16.0
wurlitzer==3.1.1
x2paddle==1.5.0
ydf==0.7.0
zipp==3.20.1
zope.interface==7.0.3
zstandard @ file:///croot/zstandard_1714677652653/work
Setting agent.venvs_cache
path
back to ~/.clearml/venvs-cache
seems to have done the trick!
"Original PIP" is empty as for this task we can rely on the docker image to provide the python packages
How are you getting:
beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work
This comes with the docker image ultralytics/ultralytics:latest
Hi @<1523701205467926528:profile|AgitatedDove14>
ClearML Agent 1.9.0
Thank you for your help @<1523701205467926528:profile|AgitatedDove14>
Great to hear it got solved. BTW network drives are supported but you have to make sure the mount file system supports locks (NFS does)
Thanks @<1523701205467926528:profile|AgitatedDove14> , will take a look
Hi @<1734020162731905024:profile|RattyBluewhale45>
What's the clearml agent version? And could you verify with the latest RC?
Lastly how are you running the agent, docker mode? What's the bade container?
Maybe it's related to this section?
WARNING:clearml_agent.helper.package.requirements:Local file not found [anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work], references removed
It was pointing to a network drive before to avoid the local directory filling up
In a cloned run with new container ultralytics/ultralytics:latest
I get this error:
clearml_agent: ERROR: Could not install task requirements!
Command '['/root/.clearml/venvs-builds/3.10/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqs7171xfem.txt', '--extra-index-url', '
', '--extra-index-url', '
returned non-zero exit status 1.
is this what you had on the Original manual execution ?
Yes this installed packages list is what succeeded via manual submission to agent
How are you getting:
beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work
is this what you had on the Original manual execution ? (i.e. not the one executed by the agent) - you can also look under "org _pip" dropdown in the "installed packages" of the failed Task
As I get a bunch of these warnings in both of the clones that failed
1724924574994 g-s:gpu1 DEBUG WARNING:root:Could not lock cache folder /root/.clearml/venvs-cache: [Errno 9] Bad file descriptor
You have an issue with your OS / mount, specifically "/mnt/clearml/" is the base folder for all the cached stuff and it fails to create the lock files there either use a Local folder or try to understand what is the issue with the Host machine /mnt/ mounts (because it looks like a network mount)
The original run completes successfully, it's only the runs cloned from the GUI which fail
DEBUG Installing build dependencies ... [?25l- \ | / - done
[?25h Getting requirements to build wheel ... [?25l- error
[1;31merror[0m: [1msubprocess-exited-with-error[0m
[31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m [31m[21 lines of output][0m
[31m [0m Traceback (most recent call last):
[31m [0m File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
[31m [0m main()
[31m [0m File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
[31m [0m json_out['return_val'] = hook(**hook_input['kwargs'])
[31m [0m File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
[31m [0m return hook(config_settings)
[31m [0m File "/tmp/pip-build-env-plkplfap/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 327, in get_requires_for_build_wheel
[31m [0m return self._get_build_requires(config_settings, requirements=[])
[31m [0m File "/tmp/pip-build-env-plkplfap/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 297, in _get_build_requires
[31m [0m self.run_setup()
[31m [0m File "/tmp/pip-build-env-plkplfap/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 497, in run_setup
[31m [0m super().run_setup(setup_script=setup_script)
[31m [0m File "/tmp/pip-build-env-plkplfap/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 313, in run_setup
[31m [0m exec(code, locals())
[31m [0m File "<string>", line 9, in <module>
[31m [0m File "/tmp/pip-req-build-_u883rml/build_tools/setup_helpers/__init__.py", line 1, in <module>
[31m [0m from .extension import * # noqa
[31m [0m File "/tmp/pip-req-build-_u883rml/build_tools/setup_helpers/extension.py", line 6, in <module>
[31m [0m from torch.utils.cpp_extension import BuildExtension as TorchBuildExtension, CppExtension
[31m [0m ModuleNotFoundError: No module named 'torch'
[31m [0m [31m[end of output][0m
[1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
[1;31merror[0m: [1msubprocess-exited-with-error[0m
[31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m See above for output.
[1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
[?25h
clearml_agent: ERROR: Could not install task requirements!
Command '['/root/.clearml/venvs-builds/3.8/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqs18jcb6cz.txt', '--extra-index-url', '
', '--extra-index-url', '
returned non-zero exit status 1.
Thank you so much for your help @<1523701205467926528:profile|AgitatedDove14> !
WARNING:clearml_agent.helper.package.requirements:Local file not found [torch-tensorrt @ file:///opt/pytorch/torch_tensorrt/py/dist/torch_tensorrt-1.3.0a0-cp38-cp38-linux_x86_64.whl], references removed
Container nvcr.io/nvidia/pytorch:22.12-py3
@<1734020162731905024:profile|RattyBluewhale45> could you attach the full Task log? Also what do you have under "installed packages" in the original manual execution that works for you?
Notice the error:
Cannot install albucore==0.0.13 and numpy==1.23.5 because these package versions have conflicting dependencies
what is the pip version you have configured in the clearml.conf? also can you provide the full Task log (i.e. click on Download in the web UI console tab)