You'll need to set the agent key and secret using environment variables, as explained here (in step #11): https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#deploying
Oh, so I think I know what might have happened
In my case I use the conda freeze option and do not even have CUDA installed on the agents.
BTW: the agent will resolve pytorch based on the install CUDA version.
It seems like the services-docker is always started with Ubuntu 18.04, even when I usetask.set_base_docker( "continuumio/miniconda:latest -v /opt/clearml/data/fileserver/:{}".format( file_server_mount ) )
No (this is deprecated and was removed because it was confusing)
https://github.com/allegroai/clearml-agent/blob/cec6420c8f40d92ab1cd6cbe5ca8f24cf351abd8/docs/clearml.conf#L101
I just updated my server to 1.0 and now the services agent is stuck in restarting:
In that case I suggest you turn on the venv cache, it will accelerate the conda environment building because it will cache the entire conda env.
Also, what kind of authentication are you using? Fixed users?
In the new version, we made it so that the default agent credentials embedded in the ClearML Server are disabled is the server is not in the open mode (i.e. requires user/password to login). This is since having those default credentials available in this mode basically means anyone without a password can actually send commands to the server (since these credentials are hard-coded)
You can simply generate another set of credentials in the profile page, and set them up in these environment variable.
Alternatively, you can add another fixed user, and use its username/password for these values
Is it also possible to specify different user/api_token for different hosts? For example I have a github and a private gitlab that I both want to be able to access.
ReassuredTiger98 my apologies I just realize you can use ~/.git-credentials for that. The agent will automatically map the host .git-credentials into the docker :)
Now the pip packages seems to ship with CUDA, so this does not seem to be a problem anymore.
Oh, you're right - I'll make sure we add it there 😄
I was wrong: I think it uses the agent.cuda_version
, not the local env cuda version.
Ah, very cool! Then I will try this, too.
Ah, perfect. Did not know this. Will try! Thanks again! 🙂
(only works for pyroch because they have diff wheeks for diff cuda versions)
I got the idea from an error I got when the agent was configured to use pip and tried to install BLAS (for PyTorch I guess) and it threw an error.
However, to use conda as package manager I need a docker image that provides conda.
It is not explained there, but do you meanCLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY:-}
?
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp
clearml-elastic /usr/local/bin/docker-entr ... Up 9200/tcp, 9300/tcp
clearml-fileserver /opt/clearml/wrapper.sh fi ... Up 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp
clearml-mongo docker-entrypoint.sh --set ... Up 27017/tcp
clearml-redis docker-entrypoint.sh redis ... Up 6379/tcp
clearml-webserver /opt/clearml/wrapper.sh we ... Up 0.0.0.0:8080->80/tcp, 8008/tcp, 8080/tcp,
8081/tcp `
I see. I was just wondering what the general approach is. I think PyTorch used to ship the pip package without CUDA packaged into it. So with conda it was nice to only install CUDA in the environment and not the host. But with pip, you had to use the host version as far as I know.