Im running of Dell XPS 15 7590 with OS Ubuntu 22.04.2 (not a mac)
proocessor - x86_64.
Did update but still getting same error
I tried adding
Task.add_requirements("cudatoolkit==12.2")#replacing pip install cudatoolkit==12.2
but then got
...
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /home/rakefet/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/rakefet/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/rakefet/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/rakefet/.clearml/pip-cache
agent.docker_apt_cache = /home/rakefet/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.default_python = 3.10
agent.cuda_version = 122
agent.cudnn_version = 0
Executing task id [2fcc77436db34a5192806636ef04918e]:
repository = None
branch = Rakefet
version_num = ed801b047eb6a0a269a1c0a7155db4057859b74c
tag =
docker_cmd =
entry_point = examples/emotion_conversion/speech-resynt/train.py
working_dir = .
Executing Conda: /home/rakefet/miniconda3/bin/conda env remove -p /home/rakefet/.clearml/venvs-builds/3.8 --quiet --json
Remove all packages in environment /home/rakefet/.clearml/venvs-builds/3.8:
Executing Conda: /home/rakefet/miniconda3/bin/conda create --yes --mkdir --prefix /home/rakefet/.clearml/venvs-builds/3.8 python=3.8
2023-08-22 14:52:22
clearml_agent: ERROR: Command '['/home/rakefet/miniconda3/bin/conda', 'create', '--yes', '--mkdir', '--prefix', '/home/rakefet/.clearml/venvs-builds/3.8', 'python=3.8']' returned non-zero exit status 1.
2023-08-22 14:52:23
Process failed, exit code 1
You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79
Also, I am indeed using conda as package manager.
package_manager: {
# supported options: pip, conda, poetry
type: conda,
As you mentioned, requirement for 'cudatoolkit=12.2' is internal to clearml-agent, so I have no access of how to solve it.
Can you send the full log as attachment?
You should manually remove the cudatoolkit from the installed packages section in the UI, then try to send it to the agent and see if it works. The question is how it ended there in the first place
Hi @<1571308003204796416:profile|HollowPeacock58>
I'm assuming this is the arm support (i,e, you are running on new mac) fix we released in one one of the last clearml-agent versions. could you update to the latest clearml-agent?
pip3 install clearml-agent==1.6.0rc2
But what about this error?
ERROR: Invalid requirement: 'cudatoolkit=12.2'
Hint: = is not a valid operator. Did you mean == ?
RequirementsManager handler
...
exception: Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
clearml_agent: ERROR: Could not install task requirements!
What is the origin of cudatoolkit=12.2 ? How should I resolve it?
It may be related to the fact i re-installed cuda drivers. and did not re-create the virtual envs. However, on my pc it runs on gpu with no errors
Hi! after a deeper check I realized that I had also problem on my local pc to communicate with Nvidia driver. I now re-installed driver and dependencies, validated with nvidia-smi command, and local run looks ok.
I re-run with clearml-agent, now getting thie error-
Successfully installed AMFM_decompy-1.0.11 MarkupSafe-2.1.3 Pillow-10.0.0 PyYAML-6.0.1 antlr4-python3-runtime-4.8 appdirs-1.4.4 attrs-23.1.0 audioread-3.0.0 bitarray-2.7.6 cffi-1.15.1 clearml-1.12.2 cmake-3.27.2 colorama-0.4.6 contourpy-1.1.0 cycler-0.11.0 decorator-5.1.1 einops-0.6.1 filelock-3.12.2 fonttools-4.42.1 furl-2.1.3 hydra_core-1.0.7 importlib-metadata-6.8.0 importlib-resources-6.0.1 jinja2-3.1.2 joblib-1.3.1 jsonschema-4.19.0 jsonschema-specifications-2023.7.1 lazy-loader-0.3 librosa-0.10.0.post2 lit-16.0.6 llvmlite-0.40.1 lxml-4.9.3 matplotlib-3.7.2 mpmath-1.3.0 msgpack-1.0.5 networkx-3.1 npy_append_array-0.9.16 numba-0.57.1 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 omegaconf-2.0.6 orderedmultidict-1.0.1 packaging-23.1 pathlib2-2.3.7.post1 pkgutil-resolve-name-1.3.10 pooch-1.6.0 portalocker-2.7.0 protobuf-4.24.1 psutil-5.9.5 pycparser-2.21 pyjwt-2.4.0 pyparsing-3.0.9 python-dateutil-2.8.2 referencing-0.30.2 regex-2023.6.3 rpds-py-0.9.2 sacrebleu-2.3.1 scikit_learn-1.3.0 scipy-1.10.1 soundfile-0.12.1 soxr-0.3.6 sympy-1.12 tabulate-0.9.0 tensorboardX-2.6.1 threadpoolctl-3.2.0 torch-2.0.1 torchaudio-2.0.2 torchvision-0.15.2 tqdm-4.65.0 triton-2.0.0 typing-extensions-4.7.1 zipp-3.16.2
Local file not found [Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1648883617327/work], references removed
Local file not found [charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1688813409104/work], references removed
Local file not found [graphviz @ file:///home/conda/feedstock_root/build_artifacts/python-graphviz_1658658635601/work], references removed
Local file not found [idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work], references removed
Local file not found [kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/kiwisolver_1648854389294/work], references removed
Local file not found [PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work], references removed
Local file not found [requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1684774241324/work], references removed
Local file not found [six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work], references removed
Local file not found [urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1689789803562/work], references removed
ERROR: Invalid requirement: 'cudatoolkit=12.2'
Hint: = is not a valid operator. Did you mean == ?
RequirementsManager handler <clearml_agent.helper.package.external_req.ExternalRequirements object at 0x7feeb28578e0> raised exception: Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
clearml_agent: ERROR: Could not install task requirements!
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
2023-08-22 14:11:32
Process failed, exit code 1
I dont know where cudatoolkit=12.2 is taken from. Its not on requirements.txt
the original Task is created by simply executing code, not through agent
This is holding me from proceeding for quite a long.. perhapse we can meet virtually and solve it?
So the original looks good, could it be you tried to clone a Task that was executed with an agent with pip, and then pushed into an agent running conda?
locally the virtual env is created with conda, but inside it there are also packages installed with pip. Is that what you mean?
on the virtual env, these are the installed packaged-
Can you do the following
Clone the Task you previously sent me the installed packages of, then enqueue the cloned task to the queue the agent with the conda.
Then send me the full log of the task that the agent run
What exactly do you mean by 'manually remove from installed packages in the UI'? Where on the UI?
Yes in the UI, clone or reset the Task, then youcan edit the installed packages section under the Execution tab
looking at 'installed packages' section after Taske reset I only see that ( NO cuda toolkit)-
Python 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0]
AMFM_decompy == 1.0.11
Cython == 3.0.2
Pillow == 10.0.0
PyYAML == 6.0.1
bitarray == 2.8.1
clearml == 1.12.2
einops == 0.6.1
hydra_core == 1.0.7
joblib == 1.3.2
librosa == 0.10.1
matplotlib == 3.7.2
numpy == 1.24.4
omegaconf == 2.0.6
packaging == 23.1
psutil == 5.9.5
regex == 2023.8.8
requests == 2.31.0
sacrebleu == 2.3.1
scikit_learn == 1.3.0
scipy == 1.10.1
six == 1.16.0
soundfile == 0.12.1
tabulate == 0.9.0
torch == 2.0.1
torchaudio == 2.0.2
tqdm == 4.66.1
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
Hi. To be on the safe side, I recreated the virtual env, ran locally and after through locally installed agent.
I get the same error - see log file.
on agent config I use conda , as I previously shared