Hi SpicyCrab51 , Thanks for the warm words 😄 Happy you enjoy our product!
As for your needs, I suggest you explore our https://clear.ml/docs/latest/docs/hyperdatasets/overview , they indeed were made to solve issues similar to what you're facing!
You can see a talk we gave that cover the Hyperdatasets https://www.youtube.com/watch?v=CcL8NNZfHlY !
Note that it is an enterprise feature, and is not part of the open source.
Contact me if you need more info 🙂
Hey there! 🙂
First of all, thank you for creating this Slack Community and giving us the opportunity to work with your wonderful software. I am in need of some help and am wondering if you have any ideas how I could solve a problem.
I am trying to find a good way to handle massive datasets with ClearML. Specifically, I want to work with 300 GB of text on S3 storage, such that
it is easy for me and my coworkers to stream the contents without needing 300 GB of disk space and/or RAM and the code used for this dataset can easily be used for future datasets and not as importantly, but it might be nice if the loading could be parallelized
I have looked at both StorageManager
and Dataset
, but neither of them seem to have features which do not rely on the hard disk. Concretely my question now is: Is there a way/feature of ClearML to at least partially do this? If not, do you know of any ClearML compatible alternatives?
Thank you in advance!