After some churning, this is the answer. Change it in the clearml-agent init
generated clearml.conf.
` default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
Executing task id [228caa5d25d94ac5aa10fa7e1d02f03c]:
repository = https://192.168.50.88:18443/tkahsion/pytorchmnist
branch = master
version_num = cfb833bcc70f3e10d3b6a96cfad3225ed682382b
tag =
docker_cmd = nvidia/cuda:10.1-runtime-ubuntu18.04
entry_point = pytorch_mnist.py
working_dir = .
Warning: could not locate requested Python version 3.9, reverting to version 3.6
Using base prefix '/usr'
New python executable in /root/.clearml/venvs-builds/3.6/bin/python3.6
Also creating executable in /root/.clearml/venvs-builds/3.6/bin/python
Installing setuptools, pip, wheel...
done.
cloning: https://192.168.50.88:18443/tkahsion/pytorchmnist
fatal: unable to access ' https://192.168.50.88:18443/tkahsion/pytorchmnist/ ': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
Repository cloning failed: Command '['clone', ' https://192.168.50.88:18443/tkahsion/pytorchmnist ', '/root/.clearml/vcs-cache/pytorchmnist.f220373e7227ec760b28c7f4cd99b534/pytorchmnist', '--quiet', '--recursive']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
(repository=' https://192.168.50.88:18443/tkahsion/pytorchmnist ', branch='master', commit_id='cfb833bcc70f3e10d3b6a96cfad3225ed682382b', tag='', docker_cmd='nvidia/cuda:10.1-runtime-ubuntu18.04', entry_point='pytorch_mnist.py', working_dir='.') - Check if remote-worker has valid credentials [see worker configuration file]
User aborted: stopping task (3)
Leaving process id 23081
DONE: Running task '228caa5d25d94ac5aa10fa7e1d02f03c' (user aborted)
Sorry take back. Just realised that this argument only worked on running the agent, but when you enqueue a task into this agent, the argument is not passed on to the container that the agent spawned.
This is the same issue for the docker image. It reverts back to nvidia/cuda:10.1-runtime-ubuntu18.04 despite me setting something else.