Is There Any Way To Get Dataset Size Without Downloading State.Json? Im Doing Ds = Clearml.Dataset.Get(Dataset_Id=D_Id), But It Instantly Tries To Download State.Json Which Is On S3. Im Only Interested In Size And File Count Which I Then Get From Calling

Answered

Is there any way to get dataset size without downloading state.json?
im doing ds = clearml.Dataset.get(dataset_id=d_id), but it instantly tries to download state.json which is on S3. Im only interested in size and file count which i then get from calling ds.get_metadata("state")

"state" comes from Task, so another workaround would be to get Task id straight from knowing the dataset ID

I dont want to download state.json because

Its 500+MB
I need S3 Creds that I dont want to store on server

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AmiableSeaturtle81
				
					0
					 × 1

Votes Newest

Answers

Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! You could get the Dataset Struct configuration object and get the job_size from there, which is the dataset size in bytes. The task IDs of the datasets are the same as the datasets' IDs by the way, so you can call all the clearml task related function on the task your get by doing Task.get_task("dataset_id")

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

Write your answer

2K Views

1 Answer

one year ago