Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello I'M Running A Local Agent . While Its Running The Task I Get This Error. Any Suggestion? Uccessfully Installed Numpy-1.24.4 Found Pytorch Version Torch==2.0.1 Matching Cuda Version 0 Found Pytorch Version Torchaudio==2.0.2 Matching Cuda Version 0 Er

Hello
I'm running a local agent . While its running the task i get this error. any suggestion?
uccessfully installed numpy-1.24.4
Found PyTorch version torch==2.0.1 matching CUDA version 0
Found PyTorch version torchaudio==2.0.2 matching CUDA version 0
ERROR: torch-2.0.1+cpu-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform.
Command 'source /home/rakefet/miniconda3/etc/profile.d/conda.sh && conda activate /home/rakefet/.clearml/venvs-builds/3.8 && pip install -r /tmp/cached-reqs1ou8m6ac.txt' returned non-zero exit status 1.
clearml_agent: ERROR: Could not install task requirements!
Command 'source /home/rakefet/miniconda3/etc/profile.d/conda.sh && conda activate /home/rakefet/.clearml/venvs-builds/3.8 && pip install -r /tmp/cached-reqs1ou8m6ac.txt' returned non-zero exit status 1.

  
  
Posted 8 months ago
Votes Newest

Answers 24


I tried adding
Task.add_requirements("cudatoolkit==12.2")#replacing pip install cudatoolkit==12.2

but then got
...
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /home/rakefet/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/rakefet/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/rakefet/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/rakefet/.clearml/pip-cache
agent.docker_apt_cache = /home/rakefet/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false

agent.default_python = 3.10
agent.cuda_version = 122
agent.cudnn_version = 0
Executing task id [2fcc77436db34a5192806636ef04918e]:
repository = None
branch = Rakefet
version_num = ed801b047eb6a0a269a1c0a7155db4057859b74c
tag =
docker_cmd =
entry_point = examples/emotion_conversion/speech-resynt/train.py
working_dir = .
Executing Conda: /home/rakefet/miniconda3/bin/conda env remove -p /home/rakefet/.clearml/venvs-builds/3.8 --quiet --json
Remove all packages in environment /home/rakefet/.clearml/venvs-builds/3.8:
Executing Conda: /home/rakefet/miniconda3/bin/conda create --yes --mkdir --prefix /home/rakefet/.clearml/venvs-builds/3.8 python=3.8
2023-08-22 14:52:22
clearml_agent: ERROR: Command '['/home/rakefet/miniconda3/bin/conda', 'create', '--yes', '--mkdir', '--prefix', '/home/rakefet/.clearml/venvs-builds/3.8', 'python=3.8']' returned non-zero exit status 1.
2023-08-22 14:52:23
Process failed, exit code 1

  
  
Posted 8 months ago

You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79

  
  
Posted 8 months ago

Im running of Dell XPS 15 7590 with OS Ubuntu 22.04.2 (not a mac)
proocessor - x86_64.
Did update but still getting same error

  
  
Posted 8 months ago

But what about this error?
ERROR: Invalid requirement: 'cudatoolkit=12.2'
Hint: = is not a valid operator. Did you mean == ?
RequirementsManager handler
...
exception: Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
clearml_agent: ERROR: Could not install task requirements!

What is the origin of cudatoolkit=12.2 ? How should I resolve it?

  
  
Posted 8 months ago

I dont know where cudatoolkit=12.2 is taken from. Its not on requirements.txt

  
  
Posted 8 months ago

As you mentioned, requirement for 'cudatoolkit=12.2' is internal to clearml-agent, so I have no access of how to solve it.

  
  
Posted 8 months ago

It may be related to the fact i re-installed cuda drivers. and did not re-create the virtual envs. However, on my pc it runs on gpu with no errors

  
  
Posted 8 months ago

You should manually remove the cudatoolkit from the installed packages section in the UI, then try to send it to the agent and see if it works. The question is how it ended there in the first place

  
  
Posted 8 months ago

What exactly do you mean by 'manually remove from installed packages in the UI'? Where on the UI?

  
  
Posted 8 months ago

Yes in the UI, clone or reset the Task, then youcan edit the installed packages section under the Execution tab

  
  
Posted 8 months ago

looking at 'installed packages' section after Taske reset I only see that ( NO cuda toolkit)-

Python 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0]

AMFM_decompy == 1.0.11
Cython == 3.0.2
Pillow == 10.0.0
PyYAML == 6.0.1
bitarray == 2.8.1
clearml == 1.12.2
einops == 0.6.1
hydra_core == 1.0.7
joblib == 1.3.2
librosa == 0.10.1
matplotlib == 3.7.2
numpy == 1.24.4
omegaconf == 2.0.6
packaging == 23.1
psutil == 5.9.5
regex == 2023.8.8
requests == 2.31.0
sacrebleu == 2.3.1
scikit_learn == 1.3.0
scipy == 1.10.1
six == 1.16.0
soundfile == 0.12.1
tabulate == 0.9.0
torch == 2.0.1
torchaudio == 2.0.2
tqdm == 4.66.1

  
  
Posted 8 months ago

on the virtual env, these are the installed packaged-

  
  
Posted 8 months ago

So the original looks good, could it be you tried to clone a Task that was executed with an agent with pip, and then pushed into an agent running conda?

  
  
Posted 8 months ago

Hi! after a deeper check I realized that I had also problem on my local pc to communicate with Nvidia driver. I now re-installed driver and dependencies, validated with nvidia-smi command, and local run looks ok.
I re-run with clearml-agent, now getting thie error-
Successfully installed AMFM_decompy-1.0.11 MarkupSafe-2.1.3 Pillow-10.0.0 PyYAML-6.0.1 antlr4-python3-runtime-4.8 appdirs-1.4.4 attrs-23.1.0 audioread-3.0.0 bitarray-2.7.6 cffi-1.15.1 clearml-1.12.2 cmake-3.27.2 colorama-0.4.6 contourpy-1.1.0 cycler-0.11.0 decorator-5.1.1 einops-0.6.1 filelock-3.12.2 fonttools-4.42.1 furl-2.1.3 hydra_core-1.0.7 importlib-metadata-6.8.0 importlib-resources-6.0.1 jinja2-3.1.2 joblib-1.3.1 jsonschema-4.19.0 jsonschema-specifications-2023.7.1 lazy-loader-0.3 librosa-0.10.0.post2 lit-16.0.6 llvmlite-0.40.1 lxml-4.9.3 matplotlib-3.7.2 mpmath-1.3.0 msgpack-1.0.5 networkx-3.1 npy_append_array-0.9.16 numba-0.57.1 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 omegaconf-2.0.6 orderedmultidict-1.0.1 packaging-23.1 pathlib2-2.3.7.post1 pkgutil-resolve-name-1.3.10 pooch-1.6.0 portalocker-2.7.0 protobuf-4.24.1 psutil-5.9.5 pycparser-2.21 pyjwt-2.4.0 pyparsing-3.0.9 python-dateutil-2.8.2 referencing-0.30.2 regex-2023.6.3 rpds-py-0.9.2 sacrebleu-2.3.1 scikit_learn-1.3.0 scipy-1.10.1 soundfile-0.12.1 soxr-0.3.6 sympy-1.12 tabulate-0.9.0 tensorboardX-2.6.1 threadpoolctl-3.2.0 torch-2.0.1 torchaudio-2.0.2 torchvision-0.15.2 tqdm-4.65.0 triton-2.0.0 typing-extensions-4.7.1 zipp-3.16.2
Local file not found [Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1648883617327/work], references removed
Local file not found [charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1688813409104/work], references removed
Local file not found [graphviz @ file:///home/conda/feedstock_root/build_artifacts/python-graphviz_1658658635601/work], references removed
Local file not found [idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work], references removed
Local file not found [kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/kiwisolver_1648854389294/work], references removed
Local file not found [PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work], references removed
Local file not found [requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1684774241324/work], references removed
Local file not found [six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work], references removed
Local file not found [urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1689789803562/work], references removed
ERROR: Invalid requirement: 'cudatoolkit=12.2'
Hint: = is not a valid operator. Did you mean == ?
RequirementsManager handler <clearml_agent.helper.package.external_req.ExternalRequirements object at 0x7feeb28578e0> raised exception: Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
clearml_agent: ERROR: Could not install task requirements!
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
2023-08-22 14:11:32
Process failed, exit code 1

  
  
Posted 8 months ago

I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)

  
  
Posted 8 months ago

Also, I am indeed using conda as package manager.
package_manager: {
# supported options: pip, conda, poetry
type: conda,

  
  
Posted 8 months ago

locally the virtual env is created with conda, but inside it there are also packages installed with pip. Is that what you mean?

  
  
Posted 8 months ago

the original Task is created by simply executing code, not through agent

  
  
Posted 8 months ago

on agent config I use conda , as I previously shared

  
  
Posted 8 months ago

This is holding me from proceeding for quite a long.. perhapse we can meet virtually and solve it?

  
  
Posted 8 months ago

Can you do the following
Clone the Task you previously sent me the installed packages of, then enqueue the cloned task to the queue the agent with the conda.
Then send me the full log of the task that the agent run

  
  
Posted 8 months ago

Can you send the full log as attachment?

  
  
Posted 8 months ago

Hi. To be on the safe side, I recreated the virtual env, ran locally and after through locally installed agent.
I get the same error - see log file.

  
  
Posted 8 months ago

Hi @<1571308003204796416:profile|HollowPeacock58>
I'm assuming this is the arm support (i,e, you are running on new mac) fix we released in one one of the last clearml-agent versions. could you update to the latest clearml-agent?

pip3 install clearml-agent==1.6.0rc2
  
  
Posted 8 months ago