Hi SubstantialElk6
quick update, once clearml 1.1 is out, we will push the clearml-data improvement, supporting chunks per version (i.e. packaging the changeset into multiple zip files, instead of a single one as the current version does).
regrading (1) storage limit server.
Ideally, we should be able to specify the batch size that we want to download, or even better, tie this in with the training by parallelising the data download, data preprocessing and batch trains.
With the next version you will be able to download partial dataset (i.e. only selected chunks), which should help with the issue.
That said, the best solution is to configure a shared cache foe all instances (both open-source and -Enterprise version support it, with some efficiency improvements on the enterprise version).
- Inefficiency. The time to pull the images is the time when the GPU is not utilised.
This one can be solved with shared cache + pipeline step, refreshing the cache in the shared cache machine.
wdyt ?