CostlyOstrich36 Hi,
I have a question related to ClearML’s indexing mechanism for cached datasets. We noticed that when storing the dataset cache folder on an NFS (Network File System), running the command clearml-data get
triggers a cache indexing process, which takes a significant amount of time. However, if we remove the NFS cache folder, the command runs almost instantly.
Could you explain how caching works in ClearML? Specifically:
- Why does ClearML perform
global
folder indexing before the script starts? - Why does it index the
dataset
cache folder when executingclearml-data get
? - Is there an option to disable cache indexing or control its behavior to optimize performance, especially when using NFS?Any insights or workarounds to speed up the process would be greatly appreciated.
Thanks!
- Werkzeug==2.2.3
- xdoctest==1.0.2
- xgboost @ file:///rapids/xgboost-1.7.1-cp38-cp38-linux_x86_64.whl
- yarl @ file:///rapids/yarl-1.8.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- zict @ file:///rapids/zict-2.2.0-py2.py3-none-any.whl
- zipp==3.15.0
Environment setup completed successfully
Starting Task Execution:
2025-01-27 13:22:37
ClearML results page: files_server:
None2025-01-27
13:25:38ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
2025-01-27
14:08:44Import libs
2025-01-27 14:08:49
Start Task
CostlyOstrich36 Fixed: It was a cache issue in NFS. However, we discovered an important detail—there were two folders in the cache: datasets
and global
. When we started the ClearML script, it began indexing the entire global
folder, which was the reason the script got stuck. After mounting only the datasets
folder, there was no delay anymore.
Do you know how to disable indexing? If we mount the global
folder on all instances, it grows very fast, and each time a new task indexes additional results.
Current configuration (clearml_agent v1.9.3, location: /tmp/clearml.conf):