Reputation
Badges 1
37 × Eureka!looking now i think it's probably since on my code I first read parameters from config file-
config = HpsYaml(paras.config)
and after that, I set these parameters to dictionary, which I connect to the task-
hparams_dict = {'batch_size': config.hparas.batch_size,
'valid_step': config.hparas.valid_step,
'max_step': config.hparas.max_step}
parameters = task.connect(hparams_dict, name='hyper_params')
So maybe the parameters from the config file override th...
from einops import rearrange, repeat
ModuleNotFoundError: No module named 'einops'
Training Translator ...
Traceback (most recent call last):
File "main.py", line 119, in <module>
from bin.train_module import Solver
File "/home/rakefet/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/import_bind.py", line 54, in __patched_import3
mod = builtins.org_import(
File "/home/rakefet/.clearml/venvs-builds/3.8/task_repository/vq-bnf-translator-Rakefet.git/bin/train_module.py", line 4, in <module>
from src.solver import BaseSolver
File...
I don't see why the dictionary is special
Ok..so I should generally avoid connecting complex objects? I guess I would create a 'mini dictionary' with a subset of params, and connectvthis instead.
Just to be clear, regarding the task.connect I have no solution. I can't keep these line commented out. Tnx in advance
But what about this error?
ERROR: Invalid requirement: 'cudatoolkit=12.2'
Hint: = is not a valid operator. Did you mean == ?
RequirementsManager handler
...
exception: Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
Failed installing GIT/HTTPs package 'cudatoolkit=12.2'
clearml_agent: ERROR: Could not install task requirements!
What is the origin of cudatoolkit=12.2 ? How should I resolve it?
It may be related to the fact i re-installed cuda drivers. and did not re-create the virtual envs. However, on my pc it runs on gpu with no errors
This is holding me from proceeding for quite a long.. perhapse we can meet virtually and solve it?
Hi. To be on the safe side, I recreated the virtual env, ran locally and after through locally installed agent.
I get the same error - see log file.
I dont know where cudatoolkit=12.2 is taken from. Its not on requirements.txt
I tried adding
Task.add_requirements("cudatoolkit==12.2")#replacing pip install cudatoolkit==12.2
but then got
...
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /home/rakefet/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/rakefet/.clearml/vcs-cache
...
Also, I am indeed using conda as package manager.
package_manager: {
# supported options: pip, conda, poetry
type: conda,
You are right! I added this and indeed issue was solved. Thanks!
Before doing anything I got -
Environment setup completed successfully
Starting Task Execution:
Traceback (most recent call last):
File "inference.py", line 10, in <module>
import soundfile as sf
File "/home/ubuntu/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/soundfile.py", line 142, in <module>
raise OSError('sndfile library not found')
OSError: sndfile library not found
What exactly do you mean by 'manually remove from installed packages in the UI'? Where on the UI?
the original Task is created by simply executing code, not through agent
on the virtual env, these are the installed packaged-
Hi! after a deeper check I realized that I had also problem on my local pc to communicate with Nvidia driver. I now re-installed driver and dependencies, validated with nvidia-smi command, and local run looks ok.
I re-run with clearml-agent, now getting thie error-
Successfully installed AMFM_decompy-1.0.11 MarkupSafe-2.1.3 Pillow-10.0.0 PyYAML-6.0.1 antlr4-python3-runtime-4.8 appdirs-1.4.4 attrs-23.1.0 audioread-3.0.0 bitarray-2.7.6 cffi-1.15.1 clearml-1.12.2 cmake-3.27.2 colorama-0.4.6 con...
locally the virtual env is created with conda, but inside it there are also packages installed with pip. Is that what you mean?
Im running of Dell XPS 15 7590 with OS Ubuntu 22.04.2 (not a mac)
proocessor - x86_64.
Did update but still getting same error
As you mentioned, requirement for 'cudatoolkit=12.2' is internal to clearml-agent, so I have no access of how to solve it.
looking at 'installed packages' section after Taske reset I only see that ( NO cuda toolkit)-
Python 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0]
AMFM_decompy == 1.0.11
Cython == 3.0.2
Pillow == 10.0.0
PyYAML == 6.0.1
bitarray == 2.8.1
clearml == 1.12.2
einops == 0.6.1
hydra_core == 1.0.7
joblib == 1.3.2
librosa == 0.10.1
matplotlib == 3.7.2
numpy == 1.24.4
omegaconf == 2.0.6
packaging == 23.1
psutil == 5.9.5
regex == 2023.8.8
requests == 2.31.0
sacrebleu == 2.3.1
scikit_learn == ...
Hi, I rerun now after minor updates. Get similar error in the same part of code-
Traceback (most recent call last):
File "main.py", line 84, in <module>
task.connect(config.model,name='model params')
File "/home/rakefet/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/task.py", line 1455, in connect
return method(mutable, name=name)
File "/home/rakefet/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/task.py", line 3573, in _connect_dictionary
dicti...
ClearML results page: None
Traceback (most recent call last):
File "main.py", line 85, in <module>
task.connect(config.hparas,name='hyper params')
File "/home/rakefet/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/task.py", line 1455, in connect
return method(mutable, name=name)
File "/home/rakefet/.clearml/venvs-builds/3.8/lib/python3.8/site...
After removing the task.connect lines, it encountered another error related to 'einops' that is not recognized. It does exist on my environment file but was not installed by the agent (according to what I see on 'Summary - installed python packages'. should I add this manually?