Answered

Hello I'M Running A Local Agent . While Its Running The Task I Get This Error. Any Suggestion? Uccessfully Installed Numpy-1.24.4 Found Pytorch Version Torch==2.0.1 Matching Cuda Version 0 Found Pytorch Version Torchaudio==2.0.2 Matching Cuda Version 0 Er

Hello
I'm running a local agent . While its running the task i get this error. any suggestion?
uccessfully installed numpy-1.24.4
Found PyTorch version torch==2.0.1 matching CUDA version 0
Found PyTorch version torchaudio==2.0.2 matching CUDA version 0
ERROR: torch-2.0.1+cpu-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform.
Command 'source /home/rakefet/miniconda3/etc/profile.d/conda.sh && conda activate /home/rakefet/.clearml/venvs-builds/3.8 && pip install -r /tmp/cached-reqs1ou8m6ac.txt' returned non-zero exit status 1.
clearml_agent: ERROR: Could not install task requirements!
Command 'source /home/rakefet/miniconda3/etc/profile.d/conda.sh && conda activate /home/rakefet/.clearml/venvs-builds/3.8 && pip install -r /tmp/cached-reqs1ou8m6ac.txt' returned non-zero exit status 1.

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Votes Newest

Answers 24

on agent config I use conda , as I previously shared

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Also, I am indeed using conda as package manager.
package_manager: {
# supported options: pip, conda, poetry
type: conda,

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Hi HollowPeacock58
I'm assuming this is the arm support (i,e, you are running on new mac) fix we released in one one of the last clearml-agent versions. could you update to the latest clearml-agent?

pip3 install clearml-agent==1.6.0rc2

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I tried adding
Task.add_requirements("cudatoolkit==12.2")#replacing pip install cudatoolkit==12.2

but then got
...
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /home/rakefet/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/rakefet/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/rakefet/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/rakefet/.clearml/pip-cache
agent.docker_apt_cache = /home/rakefet/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false

agent.default_python = 3.10
agent.cuda_version = 122
agent.cudnn_version = 0
Executing task id [2fcc77436db34a5192806636ef04918e]:
repository = None
branch = Rakefet
version_num = ed801b047eb6a0a269a1c0a7155db4057859b74c
tag =
docker_cmd =
entry_point = examples/emotion_conversion/speech-resynt/train.py
working_dir = .
Executing Conda: /home/rakefet/miniconda3/bin/conda env remove -p /home/rakefet/.clearml/venvs-builds/3.8 --quiet --json
Remove all packages in environment /home/rakefet/.clearml/venvs-builds/3.8:
Executing Conda: /home/rakefet/miniconda3/bin/conda create --yes --mkdir --prefix /home/rakefet/.clearml/venvs-builds/3.8 python=3.8
2023-08-22 14:52:22
clearml_agent: ERROR: Command '['/home/rakefet/miniconda3/bin/conda', 'create', '--yes', '--mkdir', '--prefix', '/home/rakefet/.clearml/venvs-builds/3.8', 'python=3.8']' returned non-zero exit status 1.
2023-08-22 14:52:23
Process failed, exit code 1

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Hi. To be on the safe side, I recreated the virtual env, ran locally and after through locally installed agent.
I get the same error - see log file.

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Hi! after a deeper check I realized that I had also problem on my local pc to communicate with Nvidia driver. I now re-installed driver and dependencies, validated with nvidia-smi command, and local run looks ok.
I re-run with clearml-agent, now getting thie error-
Successfully installed AMFM_decompy-1.0.11 MarkupSafe-2.1.3 Pillow-10.0.0 PyYAML-6.0.1 antlr4-python3-runtime-4.8 appdirs-1.4.4 attrs-23.1.0 audioread-3.0.0 bitarray-2.7.6 cffi-1.15.1 clearml-1.12.2 cmake-3.27.2 colorama-0.4.6 contourpy-1.1.0 cycler-0.11.0 decorator-5.1.1 einops-0.6.1 filelock-3.12.2 fonttools-4.42.1 furl-2.1.3 hydra_core-1.0.7 importlib-metadata-6.8.0 importlib-resources-6.0.1 jinja2-3.1.2 joblib-1.3.1 jsonschema-4.19.0 jsonschema-specifications-2023.7.1 lazy-loader-0.3 librosa-0.10.0.post2 lit-16.0.6 llvmlite-0.40.1 lxml-4.9.3 matplotlib-3.7.2 mpmath-1.3.0 msgpack-1.0.5 networkx-3.1 npy_append_array-0.9.16 numba-0.57.1 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 omegaconf-2.0.6 orderedmultidict-1.0.1 packaging-23.1 pathlib2-2.3.7.post1 pkgutil-resolve-name-1.3.10 pooch-1.6.0 portalocker-2.7.0 protobuf-4.24.1 psutil-5.9.5 pycparser-2.21 pyjwt-2.4.0 pyparsing-3.0.9 python-dateutil-2.8.2 referencing-0.30.2 regex-2023.6.3 rpds-py-0.9.2 sacrebleu-2.3.1 scikit_learn-1.3.0 scipy-1.10.1 soundfile-0.12.1 soxr-0.3.6 sympy-1.12 tabulate-0.9.0 tensorboardX-2.6.1 threadpoolctl-3.2.0 torch-2.0.1 torchaudio-2.0.2 torchvision-0.15.2 tqdm-4.65.0 triton-2.0.0 typing-extensions-4.7.1 zipp-3.16.2
Local file not found [Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1648883617327/work], references removed
Local file not found [charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1688813409104/work], references removed
Local file not found [graphviz @ file:///home/conda/feedstock_root/build_artifacts/python-graphviz_1658658635601/work], references removed
Local file not found [idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work], references removed
Local file not found [kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/kiwisolver_1648854389294/work], references removed
Local file not found [PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work], references removed
Local file not found [requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1684774241324/work], references removed
Local file not found [six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work], references removed
Local file not found [urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1689789803562/work], references removed
ERROR: Invalid requirement: 'cudatoolkit=12.2'
Hint: = is not a valid operator. Did you mean == ?
RequirementsManager handler <clearml_agent.helper.package.external_req.ExternalRequirements object at 0x7feeb28578e0> raised exception: Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
clearml_agent: ERROR: Could not install task requirements!
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
2023-08-22 14:11:32
Process failed, exit code 1

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

It may be related to the fact i re-installed cuda drivers. and did not re-create the virtual envs. However, on my pc it runs on gpu with no errors

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Im running of Dell XPS 15 7590 with OS Ubuntu 22.04.2 (not a mac)
proocessor - x86_64.
Did update but still getting same error

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

I see,
HollowPeacock58 can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

As you mentioned, requirement for 'cudatoolkit=12.2' is internal to clearml-agent, so I have no access of how to solve it.

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Yes in the UI, clone or reset the Task, then youcan edit the installed packages section under the Execution tab

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Can you send the full log as attachment?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

What exactly do you mean by 'manually remove from installed packages in the UI'? Where on the UI?

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

So the original looks good, could it be you tried to clone a Task that was executed with an agent with pip, and then pushed into an agent running conda?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

on the virtual env, these are the installed packaged-

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

I dont know where cudatoolkit=12.2 is taken from. Its not on requirements.txt

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

looking at 'installed packages' section after Taske reset I only see that ( NO cuda toolkit)-

Python 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0]

AMFM_decompy == 1.0.11
Cython == 3.0.2
Pillow == 10.0.0
PyYAML == 6.0.1
bitarray == 2.8.1
clearml == 1.12.2
einops == 0.6.1
hydra_core == 1.0.7
joblib == 1.3.2
librosa == 0.10.1
matplotlib == 3.7.2
numpy == 1.24.4
omegaconf == 2.0.6
packaging == 23.1
psutil == 5.9.5
regex == 2023.8.8
requests == 2.31.0
sacrebleu == 2.3.1
scikit_learn == 1.3.0
scipy == 1.10.1
six == 1.16.0
soundfile == 0.12.1
tabulate == 0.9.0
torch == 2.0.1
torchaudio == 2.0.2
tqdm == 4.66.1

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This is holding me from proceeding for quite a long.. perhapse we can meet virtually and solve it?

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

But what about this error?
ERROR: Invalid requirement: 'cudatoolkit=12.2'
Hint: = is not a valid operator. Did you mean == ?
RequirementsManager handler
...
exception: Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
clearml_agent: ERROR: Could not install task requirements!

What is the origin of cudatoolkit=12.2 ? How should I resolve it?

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

the original Task is created by simply executing code, not through agent

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Can you do the following
Clone the Task you previously sent me the installed packages of, then enqueue the cloned task to the queue the agent with the conda.
Then send me the full log of the task that the agent run

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

You should manually remove the cudatoolkit from the installed packages section in the UI, then try to send it to the agent and see if it works. The question is how it ended there in the first place

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

locally the virtual env is created with conda, but inside it there are also packages installed with pip. Is that what you mean?

  				
Posted 
	one year ago

					More  		
  Report
		
					HollowPeacock58
				
					0
					 × 1

Write your answer

1K Views

24 Answers

one year ago