ClearML FAQ | Hey There!

Answered

Hey There!

Hey there! 🙂

First of all, thank you for creating this Slack Community and giving us the opportunity to work with your wonderful software. I am in need of some help and am wondering if you have any ideas how I could solve a problem.

I am trying to find a good way to handle massive datasets with ClearML. Specifically, I want to work with 300 GB of text on S3 storage, such that

it is easy for me and my coworkers to stream the contents without needing 300 GB of disk space and/or RAM and the code used for this dataset can easily be used for future datasets and not as importantly, but it might be nice if the loading could be parallelized
I have looked at both StorageManager and Dataset , but neither of them seem to have features which do not rely on the hard disk. Concretely my question now is: Is there a way/feature of ClearML to at least partially do this? If not, do you know of any ClearML compatible alternatives?

Thank you in advance!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SpicyCrab51
				
					0
					 × 1

Votes Newest

Answers

Hi SpicyCrab51 , Thanks for the warm words 😄 Happy you enjoy our product!
As for your needs, I suggest you explore our https://clear.ml/docs/latest/docs/hyperdatasets/overview , they indeed were made to solve issues similar to what you're facing!
You can see a talk we gave that cover the Hyperdatasets https://www.youtube.com/watch?v=CcL8NNZfHlY !
Note that it is an enterprise feature, and is not part of the open source.
Contact me if you need more info 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

Write your answer

999 Views

1 Answer

2 years ago

2 years ago

Tags

Similar posts

Hi! I’Ve Checked The Docs For Clearml-Helm Charts, And I’Ve Seen The Possibility To Add Additional Volumes/Volume Mounts. Could You Please Provide An Example Of How To Do It Properly?

Hi Everyone, I’D Really Appreciate Your Advice—Thank You In Advance! I’Ve Created A Pipeline Consisting Of Tasks. I Ran Tasks And They Are In Status “Draft”. When I Run The Pipeline Script Using Pipe.Start_Locally(), Everything Works Fine, And The Pipelin

Hi, I Know That If You Have A Child Dataset Of A Dataset With Zips, And If The Parent Has Been Cached Locally, The Files In The Zips Would Be Symlinked To The Parent'S In

Is Clearml-Serving Using Either System Or Cuca Shared Memory? Or Planning To? In Our Experiments Using Perf_Analyzer The Shared Memory Experiments Showed A Huge Improvement And If We Wanted To Look Into This, Do You Have Any Pointers Of Where We Can Do T

Hi ! Trying To Run A Local Version (Docker) And Have Some Troubles With My Mac . Do You Know If The Online Documentation Support The Latest Version Of Macos?

Hey, Everybody! I Am A New User Of The Clearml Service, And I Would Like To Ask You About Your Experience With Clearml Working With An Aws Virtual Machine. My Problem Is That When The Aws Virtual Machine Is Killed, My Pipelines And Scheduling Stop Working

Hello. I Am Currently Using A Clearml Server On-Premises. Thank You For The Great Tool! I Would Like To Ask About The Status Of The Clearml-Server And Clearml-Web Github Repositories. I Think That Clearml-Server 1.13 Is Currently Released As The Latest Ve

Hi, I Have A Small Question Regarding K8S Clearml-Serving Behavior. I Have In My Cluster One Gpu Of 16Gb Ram, And Another One Of 24 Gb Ram. I Have A Llm Model Fitting The 24Gb But Not The 16Gb Gpu. When I Call The Endpoint, How Will I Know To Which Gpu I

Hello All , Good Morning ! Can You Help Better Understand The Distinction Of Cleargpt? How Is It Different From Chatgpt And What Gpt Model Are We Using In Clearml ? Thank You In Advance !

Hi Everyone! I Have A Question Regarding Events-Training_Stats_Scalar-D1Bd92A3B039400Cbafc60A7A5B1E52B Index Its Size Is 93 Gb And It Initialises For 6 Hours. Can I Decrease Size Of This Index?