Hey There!

Answered

Hey There!

Hey there! 🙂

First of all, thank you for creating this Slack Community and giving us the opportunity to work with your wonderful software. I am in need of some help and am wondering if you have any ideas how I could solve a problem.

I am trying to find a good way to handle massive datasets with ClearML. Specifically, I want to work with 300 GB of text on S3 storage, such that

it is easy for me and my coworkers to stream the contents without needing 300 GB of disk space and/or RAM and the code used for this dataset can easily be used for future datasets and not as importantly, but it might be nice if the loading could be parallelized
I have looked at both StorageManager and Dataset , but neither of them seem to have features which do not rely on the hard disk. Concretely my question now is: Is there a way/feature of ClearML to at least partially do this? If not, do you know of any ClearML compatible alternatives?

Thank you in advance!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SpicyCrab51
				
					0
					 × 1

Votes Newest

Answers

Hi SpicyCrab51 , Thanks for the warm words 😄 Happy you enjoy our product!
As for your needs, I suggest you explore our https://clear.ml/docs/latest/docs/hyperdatasets/overview , they indeed were made to solve issues similar to what you're facing!
You can see a talk we gave that cover the Hyperdatasets https://www.youtube.com/watch?v=CcL8NNZfHlY !
Note that it is an enterprise feature, and is not part of the open source.
Contact me if you need more info 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

Write your answer

2K Views

1 Answer

3 years ago

2 years ago