@<1787653555927126016:profile|SoggyDuck67> , can you try setting the binary to 3.11 instead of 3.10?
You're correct! The "execution" tab shows BINARY: python3.10 , that must have been copied from a previous task that I cloned. When I clone it again and edit it to BINARY: python then the execution gets past this.
Update: Looks like I can also set ignore_requested_python_version: true
on the agent, which was not set before.
Trying to wrap my head around what's happening: It seems as though clearml is making modifications to the running container itself, which is a surprising thing to do at runtime but I guess it may be needed to instrument my training script?
Hi @<1787653555927126016:profile|SoggyDuck67> , can you please provide the full log of the run? Also, can you please add a screenshot of the 'execution' tab of the experiment? I assume the original experiment was ran on python 3.10?
I had set the binary to python
(which on the Docker image is a symlink to python3.11
and it worked fine. All is working now. Thank you for your help
@<1787653555927126016:profile|SoggyDuck67> notice the binary
field in the Task "execution" tab, if for some reason it says "python3.10" it will try to use pytho 3.10 when running it.
That said if it does not find the request python version, it should output a warning and default to the python installed.
If you can provide the full log it will be helpful to see what happened there
Logs:
Executing task id [f5f619e4d2074438b6b9ff2b7a15d246]:
repository = git@github.com:myorg/nigel.git
branch = clearml
version_num = 7b1d97e5b73d8abab8407b58a764e199b8943fd0
tag =
docker_cmd = myorg.io/nigeltrain/train_env:dev-bdae4ce766c01568037ac57757f7e8ef6d930950
entry_point = train_classifier.py
working_dir = nigeltrain_pkg/nigeltrain/command_code
/usr/bin/python3.10: No module named virtualenv
WARNING: virtualenv call failed: Command '['python3.10', '-m', 'virtualenv', '/root/.clearml/venvs-builds/3.10', '--system-site-packages']' returned non-zero exit status 1.
INFO: Creating virtual environment with venv
The virtual environment was not created successfully because ensurepip is not
available. On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.
apt install python3.10-venv
You may need to use sudo with that command. After installing the python3-venv
package, recreate your virtual environment.
Failing command: /root/.clearml/venvs-builds/3.10/bin/python3.10
clearml_agent: ERROR: Command '['python3.10', '-m', 'venv', '/root/.clearml/venvs-builds/3.10', '--system-site-packages']' returned non-zero exit status 1.
$ clearml-agent --version
CLEARML-AGENT version 1.9.2
if it does not find the request python version, it should output a warning and default to the python installed.
If you can provide the full log it will be helpful to see what happened there
Agreed, I would have expected a warning, not failure. Here is the log, if it's helpful. But I am unblocked now.