BTW: the agent will resolve pytorch based on the install CUDA version.
I got the idea from an error I got when the agent was configured to use pip and tried to install BLAS (for PyTorch I guess) and it threw an error.
No (this is deprecated and was removed because it was confusing)
https://github.com/allegroai/clearml-agent/blob/cec6420c8f40d92ab1cd6cbe5ca8f24cf351abd8/docs/clearml.conf#L101
Is it also possible to specify different user/api_token for different hosts? For example I have a github and a private gitlab that I both want to be able to access.
ReassuredTiger98 my apologies I just realize you can use ~/.git-credentials for that. The agent will automatically map the host .git-credentials into the docker :)
In the new version, we made it so that the default agent credentials embedded in the ClearML Server are disabled is the server is not in the open mode (i.e. requires user/password to login). This is since having those default credentials available in this mode basically means anyone without a password can actually send commands to the server (since these credentials are hard-coded)
I just updated my server to 1.0 and now the services agent is stuck in restarting:
I see. I was just wondering what the general approach is. I think PyTorch used to ship the pip package without CUDA packaged into it. So with conda it was nice to only install CUDA in the environment and not the host. But with pip, you had to use the host version as far as I know.
In my case I use the conda freeze option and do not even have CUDA installed on the agents.
(only works for pyroch because they have diff wheeks for diff cuda versions)
I was wrong: I think it uses the agent.cuda_version
, not the local env cuda version.
It seems like the services-docker is always started with Ubuntu 18.04, even when I usetask.set_base_docker( "continuumio/miniconda:latest -v /opt/clearml/data/fileserver/:{}".format( file_server_mount ) )
However, to use conda as package manager I need a docker image that provides conda.
Ah, very cool! Then I will try this, too.
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp
clearml-elastic /usr/local/bin/docker-entr ... Up 9200/tcp, 9300/tcp
clearml-fileserver /opt/clearml/wrapper.sh fi ... Up 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp
clearml-mongo docker-entrypoint.sh --set ... Up 27017/tcp
clearml-redis docker-entrypoint.sh redis ... Up 6379/tcp
clearml-webserver /opt/clearml/wrapper.sh we ... Up 0.0.0.0:8080->80/tcp, 8008/tcp, 8080/tcp,
8081/tcp `
Also, what kind of authentication are you using? Fixed users?
Oh, you're right - I'll make sure we add it there 😄
You can simply generate another set of credentials in the profile page, and set them up in these environment variable.
Alternatively, you can add another fixed user, and use its username/password for these values
It is not explained there, but do you meanCLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY:-}
?
Now the pip packages seems to ship with CUDA, so this does not seem to be a problem anymore.
Ah, perfect. Did not know this. Will try! Thanks again! 🙂
You'll need to set the agent key and secret using environment variables, as explained here (in step #11): https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#deploying
Oh, so I think I know what might have happened
In that case I suggest you turn on the venv cache, it will accelerate the conda environment building because it will cache the entire conda env.