Hi @<1726047624538099712:profile|WorriedSwan6> , to answer your questions:
Would you recommend to:
- Host the dataset on clearml-server ?
- Host on S3 \ R2 ?
- Host in our K8S with minio \ on specific NFS path?
I personally like using AWS S3 if available or minio if running locally. It really depends on your infrastructure. I would suggest testing what setup works best for you.
Also, is it true unless streaming is explicitly enabled, ClearML Agent downloads the entire dataset before training begins.
What do you mean by streaming? Also, the agent orchestrates this, downloading data is done by your own code (Using clearml
SDK of course).
How important is it that the PVC of clearml will be on SDD instead of HDD
Doesn't sound very critical to me