Reputation
Badges 1
611 × Eureka!I have to correct myself, I do not even have CUDA installed. Only the driver and everything CUDA-related is provided by the docker container. This works with a container that has CUDA 11.4, but now I have one with 11.6 (latest nvidia pytorch docker).
However, even after changing the clearml.conf and overriding with CUDA_VERSION, the clearml-agent prints on the docker container agent.cuda_version = 114 ! (Other changes to the clearml.conf on the agent are reflected in the docker, so only...
Okay, but are you logs still stored on MinIO with only using sdk.development.default_output_uri ?
Thanks, I will look into it. For me the weird thing is that saving works and only deletion fails somehow.
Or there should be an early error for trying to run conda based tasks on pip agents
SuccessfulKoala55 So what happens is, that always when/after the cleanup_service runs, clearml will throw these kind of errors
I will read up on the services documentation then. Thank you very much for the help 🙂
Yea I know, I reported this 🙂 .
Very nice!
Maybe for the long-term future you could look into how to make better use of vertical space. Currently, there are 7 (5 in fullscreen mode)= different sections from content to top of the page. Maybe a compact mode would be nice or less space for content headlines.
Thanks a lot, now I think I understand.
Debug samples can only be controlled via api.file_server (or programatically)
Could you guide me how to approach this programmatically? Can I implement my own storage adapter for debug samples with ClearML interfaces or am I on my own?
My code is in classes, indeed. But I have more than one model. Actually, all the things that people store in for example yaml or json configs I store in python files. And I do not want to statically import all the models/configs.
I just checked and my user is part of the docker group.
But the problems seem to be reoccuring
Interesting: This command failes (with an error similar to the one I posted above) in conda version 4.7.12 but runs just fine in version 4.9.2: conda create --name test-pytorch python=3.8 cudatoolkit=11.1 -c conda-forge
clearml will register preinstalled conda packages as requirements.
You mean I should have opencv/ffmpeg available on the clearml-server machine?
Yes, but this seems pretty reasonable to assume imo.
I randocker run -it -v /home/hostuser/.ssh/:/root/.ssh ubuntu:18.04but cloning does not work and this is what ls -lah /root/.ssh gives inside the docker container:
` -rw------- 1 1001 1001 1.5K Apr 8 12:28 authorized_keys
-rw-rw-r-- 1 1001 1001 208 Apr 29 09:15 config
-rw------- 1 1001 1001 432 Apr 8 12:53 id_ed25519
-rw-r--r-- 1 1001 1001 119 Apr 8 12:53 id_ed25519.pub
-rw------- 1 1001 1001 432 Apr 29 09:16 id_gitlab
-rw-r--r-- 1 1001 1001 119 Apr 29 09:25 id_gitlab.pub
-...
But yeah, I see the point of enterprise having this feature and basic not 🙂
Works with 1.4. Sorry for not checking versions myself!
Can you ping me when it is updated in None so I can update my installation?
But this means the logger will use the default fileserver or not?
Mhhm, now conda env creation takes forever since it probably resolves conflicts. At least that is what is happening when I tried to manually install my environment
Is ther a way to see the contents of /tmp/conda_envaz1ne897.yml ? Seems to be deleted after the task is finihsed