I execute the clearml-agent this way:/home/machine/miniconda3/envs/py36/bin/python3 /home/machine/miniconda3/envs/py36/bin/clearml-agent daemon --services-mode --cpu-only --queue services --create-queue --log-level DEBUG --detached
I think clearml-agent tries to execute /usr/bon/python3.6 to start the task, instead of using the python used to start clearml-agent
SInce it fails on the first machine (clearml-server), I try to run it on another, on-prem machine (also used as an agent)
Also, what do you mean by another machine? Are you running the ClearML services agent daemon on another machine?
SuccessfulKoala55  I tried to setup in a different machine the clearml-agent and now I get a different error message in the logs:Warning: could not locate requested Python version 3.6, reverting to version 3.6 clearml_agent: ERROR: Python executable with version '3.6' defined in configuration file, key 'agent.default_python', not found in path, tried: ('python3.6', 'python3', 'python')
in clearml.conf:agent.package_manager.system_site_packages = true agent.package_manager.pip_version = "==20.2.3"
Python executable with version '3.6' defined in configuration file
whatever will allow the agent daemon to create a venv 🙂
JitteryCoyote63  can you try to look at the logs in  /tmp/.clearml_agent_out.j7wo7ltp.txt ?
interestingly, it works on one machine, but not on another one
and in the logs:agent.worker_name = worker1 agent.force_git_ssh_protocol = false agent.python_binary = agent.package_manager.type = pip agent.package_manager.pip_version = \=\=20.2.3 agent.package_manager.system_site_packages = true agent.package_manager.force_upgrade = false agent.package_manager.conda_channels.0 = pytorch agent.package_manager.conda_channels.1 = conda-forge agent.package_manager.conda_channels.2 = defaults agent.package_manager.torch_nightly = false agent.venvs_dir = /home/machine/.clearml/venvs-builds.1.2 agent.venvs_cache.max_entries = 10 agent.venvs_cache.free_space_threshold_gb = 2.0 agent.vcs_cache.enabled = true agent.vcs_cache.path = /home/machine/.clearml/vcs-cache agent.venv_update.enabled = false agent.pip_download_cache.enabled = true agent.pip_download_cache.path = /home/machine/.clearml/pip-download-cache agent.translate_ssh = true agent.reload_config = false agent.docker_pip_cache = /home/machine/.clearml/pip-cache agent.docker_apt_cache = /home/machine/.clearml/apt-cache.1.2 agent.docker_force_pull = false agent.default_docker.image = nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 agent.enable_task_env = false agent.default_python = 3.6 agent.cuda_version = 0 agent.cudnn_version = 0
What's the error on the other machine?
so the clearml-agent daemon takes 3.6 as the default, and when running the service, for some reason 3.6 is not in the path
if not set, this value is taken from the system python
The file  /tmp/.clearml_agent_out.j7wo7ltp.txt   does not exist
I get the same error when trying to run the task using clearml-agent services-mode with docker, so weird
User aborted: stopping task  usually means Task status changed or "stopping" was placed in the  status_message  field while the task was running
Oof now I cannot start the second controller in the services queue on the same second machine, it fails withProcessing /tmp/build/80754af9/cffi_1605538068321/work ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/build/80754af9/cffi_1605538068321/work' clearml_agent: ERROR: Could not install task requirements! Command '['/home/machine/.clearml/venvs-builds.1.3/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqsi4hq9s6z.txt']' returned non-zero exit status 1.
Alright SuccessfulKoala55 I was able to make it work by downgrading clearml-agent to 0.17.2
Can I simply set  agent.python_binary = path/to/conda/python3.6  ?
Ok, now I get  ERROR: No matching distribution found for conda==4.9.2 (from -r /tmp/cached-reqscaw2zzji.txt (line 13))
Ok, deleting installed packages list worked for the first task
 
				