Tried to install cudatoolkit==11.1 manually in this environemnt and got:
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Package xz conflicts for:
python=3.8 -> xz[version='>=5.2.4,<5.3.0a0|>=5.2.4,<6.0a0|>=5.2.5,<5.3.0a0|>=5.2.5,<6.0a0']
Package libstdcxx-ng conflicts for:
python=3.8 -> libstdcxx-ng[version='>=7.3.0|>=7.5.0|>=9.3.0']
cudatoolkit=11.1 -> libstdcxx-ng[version='>=9.3.0']
Package libgcc-ng conflicts for:
cudatoolkit=11.1 -> libgcc-ng[version='>=9.3.0']
python=3.8 -> libgcc-ng[version='>=7.3.0|>=7.5.0|>=9.3.0']
Package __glibc conflicts for:
cudatoolkit=11.1 -> __glibc[version='>=2.17,<3.0.a0']
Package libffi conflicts for:
python=3.8 -> libffi[version='>=3.2.1,<3.3.0a0|>=3.2.1,<3.3a0|>=3.3,<3.4.0a0']
Package ncurses conflicts for:
python=3.8 -> ncurses[version='>=6.1,<6.3.0a0|>=6.1,<7.0a0|>=6.2,<6.3.0a0|>=6.2,<7.0a0']
Package zlib conflicts for:
python=3.8 -> zlib[version='>=1.2.11,<1.3.0a0']
Package python_abi conflicts for:
python=3.8 -> python_abi[version='*|3.8.*',build=*_cp38]
Package sqlite conflicts for:
python=3.8 -> sqlite[version='>=3.30.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0|>=3.32.3,<4.0a0|>=3.33.0,<4.0a0|>=3.34.0,<4.0a0']
Package bzip2 conflicts for:
python=3.8 -> bzip2[version='>=1.0.8,<2.0a0']
Package readline conflicts for:
python=3.8 -> readline[version='>=7.0,<8.0a0|>=8.0,<9.0a0']
Package openssl conflicts for:
python=3.8 -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a|>=1.1.1h,<1.1.2a|>=1.1.1i,<1.1.2a|>=1.1.1j,<1.1.2a']
Package tk conflicts for:
python=3.8 -> tk[version='>=8.6.10,<8.7.0a0|>=8.6.8,<8.7.0a0|>=8.6.9,<8.7.0a0']
Package pip conflicts for:
python=3.8 -> pip
Package ld_impl_linux-64 conflicts for:
python=3.8 -> ld_impl_linux-64[version='>=2.34']The following specifications were found to be incompatible with your CUDA driver:
- cudatoolkit=11.1 -> __cuda[version='>=11.1']
Your installed CUDA driver is: 11.2
From the logs when ran with --foreground I
I do not see any conda create
command.
The problem is that clearml installs
cudatoolkit=11.0
but
cudatoolkit=11.1
is needed.
You suggested this fix earlier, but I am not sure why it didnt work then.
Hmm , could you test with the clearml-agent 0.17.2 ? making surethis actually solves the problem
Do you know how I can get this version?
I installed my local conda environment from an environment.yml
without issues, so maybe clearml makes some changes that leads to conflicts which finally leads to the cpu-version install.
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
_libgcc_mutex=0.1=conda_forge
_openmp_mutex=4.5=1_llvm
absl-py=0.12.0=pypi_0
aiostream=0.4.2=pypi_0
attrs=20.3.0=pypi_0
blas=1.0=mkl
bzip2=1.0.8=h7b6447c_0
ca-certificates=2020.10.14=0
cached-property=1.5.2=pypi_0
cachetools=4.2.1=pypi_0
certifi=2020.6.20=py37_0
chardet=4.0.0=pypi_0
clearml=0.17.4=pypi_0
cloudpickle=1.6.0=py_0
cudatoolkit=11.1.1=h6406543_8
cycler=0.10.0=py37_0
cytoolz=0.11.0=py37h7b6447c_0
dask-core=2021.2.0=pyhd8ed1ab_0
decorator=4.4.2=py_0
dm-control=0.0.355168290=pypi_0
dm-env=1.4=pypi_0
dm-tree=0.1.5=pypi_0
ffmpeg=4.3=hf484d3e_0
freetype=2.10.4=h5ab3b9f_0
furl=2.1.0=pypi_0
future=0.18.2=pypi_0
glfw=2.1.0=pypi_0
gmp=6.2.1=h58526e2_0
gnutls=3.6.13=h85f3911_1
google-auth=1.27.1=pypi_0
google-auth-oauthlib=0.4.3=pypi_0
grpcio=1.36.1=pypi_0
gym=0.18.0=pypi_0
h5py=3.2.1=pypi_0
humanfriendly=9.1=pypi_0
idna=2.10=pypi_0
imageio=2.9.0=py_0
imageio-ffmpeg=0.4.3=pypi_0
importlib-metadata=3.7.2=pypi_0
jpeg=9b=habf39ab_1
jsonschema=3.2.0=pypi_0
kiwisolver=1.3.1=py37h2527ec5_1
labmaze=1.0.3=pypi_0
lame=3.100=h7b6447c_0
lcms2=2.11=h396b838_0
ld_impl_linux-64=2.33.1=h53a641e_7
libedit=3.1.20191231=h14c3975_1
libffi=3.3=he6710b0_2
libgcc-ng=9.3.0=h2828fa1_18
libgfortran-ng=7.3.0=hdf63c60_0
libgomp=9.3.0=h2828fa1_18
libiconv=1.16=h516909a_0
libpng=1.6.37=hbc83047_0
libstdcxx-ng=9.3.0=h6de172a_18
libtiff=4.1.0=h2733197_1
libuv=1.41.0=h7f98852_0
llvm-openmp=11.0.1=h4bd325d_0
lxml=4.6.2=pypi_0
lz4-c=1.9.3=h9c3ff4c_0
markdown=3.3.4=pypi_0
matplotlib-base=3.3.4=py37h0c9df89_0
mkl=2020.4=h726a3e6_304
mkl-service=2.3.0=py37h8f50634_2
mkl_fft=1.3.0=py37h902c9e0_1
mkl_random=1.2.0=py37h9fdb41a_1
moviepy=1.0.3=pypi_0
ncurses=6.2=he6710b0_1
nettle=3.6=he412f7d_0
networkx=2.5=py_0
ninja=1.10.2=h4bd325d_0
numpy=1.19.2=py37h54aff64_0
numpy-base=1.19.2=py37hfa32c7d_0
oauthlib=3.1.0=pypi_0
olefile=0.46=py37_0
openh264=2.1.1=h780b84a_0
openssl=1.1.1j=h7f98852_0
orderedmultidict=1.0.1=pypi_0
pathlib2=2.3.5=pypi_0
pillow=7.2.0=pypi_0
pip=21.0.1=pyhd8ed1ab_0
proglog=0.1.9=pypi_0
protobuf=3.15.5=pypi_0
psutil=5.8.0=pypi_0
pyasn1=0.4.8=pypi_0
pyasn1-modules=0.2.8=pypi_0
pybullet=3.0.9=pypi_0
pygame=2.0.1=pypi_0
pyglet=1.5.0=pypi_0
pyjwt=2.0.1=pypi_0
pyopengl=3.1.5=pypi_0
pyparsing=2.4.7=py_0
pyrsistent=0.17.3=pypi_0
python=3.7.10=hdb3f193_0
python-dateutil=2.8.1=py_0
python_abi=3.7=1_cp37m
pytorch=1.8.0=py3.7_cuda11.1_cudnn8.0.5_0
pywavelets=1.1.1=py37h7b6447c_2
pyyaml=5.3.1=py37h7b6447c_1
readline=8.1=h27cfd23_0
requests=2.25.1=pypi_0
requests-file=1.5.1=pypi_0
requests-oauthlib=1.3.0=pypi_0
rsa=4.7.2=pypi_0
scikit-image=0.17.2=py37hdf5156a_0
scipy=1.6.1=py37h91f5cce_0
setuptools=52.0.0=py37h06a4308_0
six=1.15.0=py_0
sqlite=3.33.0=h62c20be_0
tensorboard=2.4.1=pypi_0
tensorboard-plugin-wit=1.8.0=pypi_0
tensorboardx=2.1=pypi_0
tifffile=2020.10.1=py37hdd07704_2
tk=8.6.10=hbc83047_0
toolz=0.11.1=py_0
torchaudio=0.8.0=py37
torchvision=0.9.0=py37_cu111
tornado=6.1=py37h5e8e339_1
tqdm=4.59.0=pypi_0
typing_extensions=3.7.4.3=py_0
urllib3=1.26.3=pypi_0
werkzeug=1.0.1=pypi_0
wheel=0.36.2=pyhd3deb0d_0
xz=5.2.5=h7b6447c_0
yaml=0.2.5=h7b6447c_0
zipp=3.4.1=pypi_0
zlib=1.2.11=h7b6447c_3
zstd=1.4.9=ha95c52a_0
btw: I also tested the clearml-agent running on a different machine and with python 3.8 and I get the same problems.
I mean the version which it bases the PyTorch installation on.
My driver says "CUDA Version: 11.2" (I am not even sure this is correct, since I do not remember installing code in this machine, but idk) and there is no pytorch for 11.2, so maybe it fallbacks to cpu?
Uninstall the current clearml-agent and reinstall this wheel, I hacked it to have ==, let's see if that works
You suggested this fix earlier, but I am not sure why it didnt work then.
I guess that has nothing to do with the diff version, right ?
But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me
Nvm, I took a look at conda history and there I see it
How does clearml-agent create the conda environment?
conda env update -p .clearml/venvs-builds/3.8 ./environment.yml
with environment.yml
name: clearml
channels:
- pytorch
- anaconda
- conda-forge
- defaults
dependencies:
- pytorch==1.8.0
send me the conda freeze:
# Name Version Build Channel
...
@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:
nvidia-smi
And here:
ls -la /usr/local/
I tried to run the task with detect_with_conda_freeze: false
instead of true
and got
Executing Conda: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch 'pip<20.2' --quiet --json
Pass
Conda: Trying to install requirements:
['pytorch~=1.8.0']
Executing Conda: /home/tim/miniconda3/condabin/conda env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envh7rq4qmc.yml --quiet --json
Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):
- pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']
The following specifications were found to be incompatible with each other:
Package cudatoolkit conflicts for:
cudatoolkit=11.0
Conda: Installing requirements: step 2 - using pip:
['clearml==0.17.4', 'tensorboard==2.4.1', 'pytorch~=1.8.0']
Collecting tensorboard==2.4.1
Using cached tensorboard-2.4.1-py3-none-any.whl (10.6 MB)
ERROR: Could not find a version that satisfies the requirement pytorch~=1.8.0 (from -r /tmp/cached-reqsubuv0zrf.txt (line 3)) (from versions: 0.1.2, 1.0.2)
ERROR: No matching distribution found for pytorch~=1.8.0 (from -r /tmp/cached-reqsubuv0zrf.txt (line 3))
Command 'source /home/tim/miniconda3/etc/profile.d/conda.sh && conda activate /home/tim/.clearml/venvs-builds/3.8 && pip install -r /tmp/cached-reqsubuv0zrf.txt' returned non-zero exit status 1.