What will happen if I disable the cache? Is there a way to find out which experiment is hung and why? in order to avoid this?
Regarding your questions:
disable VCS cache - https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L120 I think lock is created when running an experiment, maybe it hung so the lock never got lifted
wdyt?
EDIT CostlyOstrich36
third image - cache after running another task with new cache file created even though cache is disabled
EDIT
I have disabled VCS-cache and it seems that the multiple cache files are still created when running a new task. Also still the lock is created once a new experiment is run: first image - cache after removing lock, second image - a few seconds later after running a new task. Also attached log output of the task uploaded (with ### replacing non relevant details).
AbruptWorm50 , that's strange. I'll take a look as well. What version of clearml
are you using?
Yeah this is a lock which is always in our cache, cant figure out why it's there, but when I delete the lock and the other files, they always reappear when I run a new clearml task.
Is the lock something that occurs on your machine regardless of ClearML?
Disabling the VCS cache will no longer cache the cloned git folder You can filter by 'Running' Experiments in ClearML and search for one that hasn't reported for a while and start investigating those
My questions are:
- how can I avoid creating tens of new cache files?
- do you happen to know why this lock is created and how it is connected to the above error (in the link - regarding "failing to clone.. ")
Yeah this is a lock which is always in our cache, cant figure out why it's there, but when I delete the lock and the other files, they always reappear when I run a new clearml task.
Another thing I should note: I have recently had an error which fix was to run git config --global --add safe.directory /root/.clearml/vcs-cache/r__ (git repo name).d7f
Ever since, once I run a new task - a new file appears in the cache with the format of <git repo name.lock file name_a bunch of numbers>
This is not something that we defined or created- if I understand your question. It is created once a ClearML task is run, and there until the lock is deleted (which is something we do to handle another error I posted here about)
Hi AbruptWorm50 ,
The cached files are used by ClearML - Here is an example:
https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L120
Regarding the first question - What is the lock from the 24th of April. It seems that this process is what is blocking cache usage