Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with
pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl
Then run the daemon with the additional --debug
argument, basically:
clearml-agent --debug daemon --foreground ...
Once the agent is running please send the Task's log from your console 🙂
btw: I also tested the clearml-agent running on a different machine and with python 3.8 and I get the same problems.
Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed
Thanks! Tomorrow is great, I'll put the wheel here 🙂
Hmm maybe this is the issue, :
Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):
- pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']
This makes no sense, conda is saying pytorch=1.8 needs cudatoolkit <10.2/10.3 but actually it needs cudatoolkit 11.1
I just started a task from this environment and it fails on the agent.
Still shows CPU version when I run conda list
And the one with the CPU version? is it with "~=" or "="?
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit==11.1.1
- pytorch==1.8.0
Gives CPU version
Or there should be an early error for trying to run conda based tasks on pip agents
From the logs when ran with --foreground I
I do not see any conda create
command.
How does clearml-agent create the conda environment?
Can you ping me when it is updated in None so I can update my installation?
I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.
You suggested this fix earlier, but I am not sure why it didnt work then.
It's always preferred to use conda_freeze: false
That said, if you do use conda_freeze: true
it should also freeze the cudatoolkit, so it should have worked.
BTW when you say it worked, is it 0.17.2 version or the hacked RC I sent ?
I just tried to envrionment setup steps that clearml-agent is doing locally, but with my environment.yml instead of the one that clearml generates.
(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)
This is the file which installs the GPU version
@<1523701868901961728:profile|ReassuredTiger98> thank you so much for testing it!