Unanswered
Hey All, We Have Clearml Running On Our K8S On Prem With 4 Worker Nodes For Clearml-Agents And One Node For Clearml-Server.
We Would Like To Start Using Clearml Datasets And Running Pipelineg Training On It.
The Datasets Might Be Very Large, About 20G.
I
Hi @<1726047624538099712:profile|WorriedSwan6> , to answer your questions:
Would you recommend to:
- Host the dataset on clearml-server ?
- Host on S3 \ R2 ?
- Host in our K8S with minio \ on specific NFS path?
I personally like using AWS S3 if available or minio if running locally. It really depends on your infrastructure. I would suggest testing what setup works best for you.
Also, is it true unless streaming is explicitly enabled, ClearML Agent downloads the entire dataset before training begins.
What do you mean by streaming? Also, the agent orchestrates this, downloading data is done by your own code (Using clearml
SDK of course).
How important is it that the PVC of clearml will be on SDD instead of HDD
Doesn't sound very critical to me
58 Views
0
Answers
3 months ago
3 months ago