I see. I was just wondering what the general approach is. I think PyTorch used to ship the pip package without CUDA packaged into it. So with conda it was nice to only install CUDA in the environment and not the host. But with pip, you had to use the host version as far as I know.
Now the pip packages seems to ship with CUDA, so this does not seem to be a problem anymore.
I got the idea from an error I got when the agent was configured to use pip and tried to install BLAS (for PyTorch I guess) and it threw an error.
BTW: the agent will resolve pytorch based on the install CUDA version.
(only works for pyroch because they have diff wheeks for diff cuda versions)
In my case I use the conda freeze option and do not even have CUDA installed on the agents.
In that case I suggest you turn on the venv cache, it will accelerate the conda environment building because it will cache the entire conda env.
I was wrong: I think it uses the agent.cuda_version
, not the local env cuda version.
No (this is deprecated and was removed because it was confusing)
https://github.com/allegroai/clearml-agent/blob/cec6420c8f40d92ab1cd6cbe5ca8f24cf351abd8/docs/clearml.conf#L101
Ah, perfect. Did not know this. Will try! Thanks again! 🙂
Is it also possible to specify different user/api_token for different hosts? For example I have a github and a private gitlab that I both want to be able to access.
ReassuredTiger98 my apologies I just realize you can use ~/.git-credentials for that. The agent will automatically map the host .git-credentials into the docker :)
Ah, very cool! Then I will try this, too.
It seems like the services-docker is always started with Ubuntu 18.04, even when I usetask.set_base_docker( "continuumio/miniconda:latest -v /opt/clearml/data/fileserver/:{}".format( file_server_mount ) )
However, to use conda as package manager I need a docker image that provides conda.
I just updated my server to 1.0 and now the services agent is stuck in restarting:
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp
clearml-elastic /usr/local/bin/docker-entr ... Up 9200/tcp, 9300/tcp
clearml-fileserver /opt/clearml/wrapper.sh fi ... Up 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp
clearml-mongo docker-entrypoint.sh --set ... Up 27017/tcp
clearml-redis docker-entrypoint.sh redis ... Up 6379/tcp
clearml-webserver /opt/clearml/wrapper.sh we ... Up 0.0.0.0:8080->80/tcp, 8008/tcp, 8080/tcp,
8081/tcp `
Also, what kind of authentication are you using? Fixed users?
Oh, so I think I know what might have happened
In the new version, we made it so that the default agent credentials embedded in the ClearML Server are disabled is the server is not in the open mode (i.e. requires user/password to login). This is since having those default credentials available in this mode basically means anyone without a password can actually send commands to the server (since these credentials are hard-coded)
You'll need to set the agent key and secret using environment variables, as explained here (in step #11): https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#deploying
It is not explained there, but do you meanCLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY:-}
?
Oh, you're right - I'll make sure we add it there 😄
You can simply generate another set of credentials in the profile page, and set them up in these environment variable.
Alternatively, you can add another fixed user, and use its username/password for these values