Current configuration (clearml_agent v1.9.3, location: /tmp/clearml.conf):
@<1523701070390366208:profile|CostlyOstrich36> Hi,
I have a question related to ClearML’s indexing mechanism for cached datasets. We noticed that when storing the dataset cache folder on an NFS (Network File System), running the command clearml-data get triggers a cache indexing process, which takes a significant amount of time. However, if we remove the NFS cache folder, the command runs almost instantly.
Could you explain how caching works in ClearML? Specifically:
- Why does ClearML perform
globalfolder indexing before the script starts? - Why does it index the
datasetcache folder when executingclearml-data get? - Is there an option to disable cache indexing or control its behavior to optimize performance, especially when using NFS?Any insights or workarounds to speed up the process would be greatly appreciated.
Thanks!
- Werkzeug==2.2.3- xdoctest==1.0.2- xgboost @ file:///rapids/xgboost-1.7.1-cp38-cp38-linux_x86_64.whl- yarl @ file:///rapids/yarl-1.8.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl- zict @ file:///rapids/zict-2.2.0-py2.py3-none-any.whl- zipp==3.15.0Environment setup completed successfullyStarting Task Execution:2025-01-27 13:22:37ClearML results page: files_server: None2025-01-27 13:25:38ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start2025-01-27 14:08:44Import libs2025-01-27 14:08:49Start Task
@<1523701070390366208:profile|CostlyOstrich36> Fixed: It was a cache issue in NFS. However, we discovered an important detail—there were two folders in the cache: datasets and global . When we started the ClearML script, it began indexing the entire global folder, which was the reason the script got stuck. After mounting only the datasets folder, there was no delay anymore.
Do you know how to disable indexing? If we mount the global folder on all instances, it grows very fast, and each time a new task indexes additional results.