Hi, I would like to understand how I can set the pip cache location for my agent,
ClumsyElephant70 by default the pip cache (and all other cache folders) are mounted back into the host itself ~/.clearml/
I'm assuming the idea is shared cache, if this is the case, do:docker_pip_cache = ~/my_shared_nfs/pip-cache
https://github.com/allegroai/clearml-agent/blob/e3e6a1dda81bee2dd20a64d09746568e415f1823/docs/clearml.conf#L139
I want to cache as much as possible and /clearml-cache/venvs-cach
(on the host) does contain caches venvs. But /clearml-cache/venvs-builds
is empty. My question was how to also cache venvs_builds
Because I think you need to map out the pip cache folder to the docker
AgitatedDove14 one more thing regarding the initial question,apt-cache
, pip-cache
, pip-download-cache
, vcs-cache
and venvs-cache
contain data on the shared clearml-cache
but venvs-build
does not? What sort of data would be stored in the venvs-build
folder? I do have venvs_dir = /clearml-cache/venvs-builds
specified in the clearml.conf
it appears at multiple places. Seems like the mapping of pip and apt cache does work but the access rights are now an issue
Ok it is more a docker issue, I guess it is not feasible reading the thread.
They all want to be ubuntu:gpu0. Any idea how I can randomize it? Setting the CLEARML_WORKER_ID env var somehow does not work
You should not have this entry in the conf file, the "worker_id" should be unique (and is based on the "worker_name" as a prefix. You can control it via env variales:CLEARML_WORKER_ID
In theory it should have worked.
Can you send me the full Task log? (with cache and everything?)
I suspect since these are not the default folders, something is misconfigured / missing
(you can DM the log, so it won't end on a public the channel))
the cache on the host is mounted as nfs and the nfs server was configured to not allow the clients to do root operations
` # pip cache folder mapped into docker, used for python package caching
docker_pip_cache = /clearml-cache/pip-cache
# apt cache folder mapped into docker, used for ubuntu package caching
docker_apt_cache = /clearml-cache/apt-cache
docker_internal_mounts {
apt_cache: "/clearml-cache/apt-cache"
pip_cache: "/clearml-cache/pip-cache"
vcs_cache: "/clearml-cache/vcs-cache"
venv_build: "/clearml-cache/venvs-builds"
pip_download: "/clearml-cache/pip-download-cache"
ssh_folder: "/clearml-cache/ssh-cache"
} `
So it should cache the venvs right?
Correct,
path: /clearml-cache/venvs-cache
Just making sure, this is the path to the host cache folder
ClumsyElephant70 I think I lost track of the current issue 😞 what's exactly not being cached (or working)?
So I don't need docker_internal_mounts
at all?
The agents also share the clearml.conf
file which causes some issue with the worker_id/worker_name. They all want to be ubuntu:gpu0. Any idea how I can randomize it? Setting the CLEARML_WORKER_ID env var somehow does not work
Hi ClumsyElephant70 ,
What about# pip cache folder mapped into docker, used for python package caching docker_pip_cache = ~/.clearml/pip-cache # apt cache folder mapped into docker, used for ubuntu package caching docker_apt_cache = ~/.clearml/apt-cache
Try running with all them marked out so it will take defaults
Exactly, all agents should share the cache that is mounted via nfs. I think it is working now 🙂
Hey Natan, good point! But I have actually set both
Hi AgitatedDove14 one more question about efficient caching, is it possible to cache/share docker images between agents?
I think you need to map internal docker pip cache to /root/.cache/pip
I do have this setting in my clearml.conf filevenvs_cache: { free_space_threshold_gb: 50.0 path: /clearml-cache/venvs-cache }
So it should cache the venvs right? I also see content in the /clearml-cache/venvs-cache
folder. Because I have venvs_cache configured there is nothing in venvs-build, since it uses the cache?
What sort of data would be stored in the
venvs-build
folder?
ClumsyElephant70 temporary (lifetime of the task execution) virtual environment, including the code etc. It is deleted and recreated for every new task launched (or restored from cache, if venvs_cache is enabled)
so now there is the user conflict between the host and the agent inside the container
Can you add a bit more from the log for more context as well?
hm... Now with commenting it out I have the following problem:docker_pip_cache = /clearml-cache/pip-cache
On host:drwxrwxrwx 5 root root 5 Mar 10 17:17 pip-cache
in task logs:chown: changing ownership of '/root/.cache/pip': Operation not permitted
is it possible to cache/share docker images between agents?
Like a shared folder for docker pulled images?
https://forums.docker.com/t/how-to-share-the-images-at-all-the-local-hosts/24894/7
you might be able to share "/var/lib/docker/image" but I'm not sure how stable it is (definitely risky)
W: chown to _apt:root of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (1: Operation not permitted) W: chmod 0700 of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (1: Operation not permitted) Collecting pip==20.1.1
Executing: ['docker', 'run',......] chown: changing ownership of '/root/.cache/pip': Operation not permitted Get:1
focal-security InRelease [114 kB] Get:2
focal InRelease [265 kB] Get:3
focal-updates InRelease [114 kB
It is at the top of the logs