I'm having issue with running clearml-agent in docker mode. I spin up an agent in the following way:
clearml-agent daemon --queue docker_test --docker nvidia/cuda:11.0-devel-ubuntu20.04 --cpu-only

and I create and enqueue a task from a jupyter notebook like so:

task = Task.create(project_name='foo', task_name=f'barr',
                   docker_args="-v /some/disk:/root/disk",
                   argparse_args=[("frames", "1-10")]
Task.enqueue(task.id, queue_name='docker_test')

The output runs a bunch of cmmands to install the env inside the docker. It reaches a point where it does "pip install clearml-agent" and once that command ends, the task just hangs (it never seems to start running the actual python script).

Also, it runs with "--disable-monitoring" for some reason. why is that?
NVIDIA_VISIBLE_DEVICES=none $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id <TID>

Posted 11 months ago
Votes Newest


UPDATE: The issue in clearml.conf
In the API settings, the server address used an alias which was not defined in the docker. Once that was replaced with the explicit IP address, everything worked as expected

Posted 11 months ago
1 Answer
11 months ago
11 months ago