Hello, I am a data engineer but new to clearml.
If you train in batches then you should only get acces to the batch of document in those 100k. You could use s3 and implement the fetch in the get_item method :)
Answered
Hey, Is There Some Way / Workaround To Speed Up Working With Datasets With Large Number Of Files? Getting A Local Copy Of One Of Our Dataset With 70K Files Already Takes Longer Than Expected, But Working With A Dataset Of Around 100K Files That Has Multip
Hey, is there some way / workaround to speed up working with datasets with large number of files? Getting a local copy of one of our dataset with 70k files already takes longer than expected, but working with a dataset of around 100k files that has multiple parents is just unusable. Should we just avoid merging datasets for this many files? The datasets themselves are small, they're just split into a large number of files.
947 Views
1
Answer
one year ago
one year ago
Tags
Similar posts