Hi, I'M Trying To Create A Dataset With 186 Parent Datasets. The Process Fails Due To Oom, The Machine Has 64 Gb Of Ram. Does A Workaround Exists, For Example, Generating Intermediate Datasets ? Or Does The Total Memory Consumed Depends On The Number Of

Answered

Hi,
I'm trying to create a dataset with 186 parent datasets.
The process fails due to OOM, the machine has 64 GB of RAM.

Does a workaround exists, for example, generating intermediate datasets ?
Or does the total memory consumed depends on the number of files in all the parent datasets, and I need to buy more memory ?

  				
Posted 
	one year ago

					More  		
  Report
		
					HurtAnt92
				
					0
					 × 1

Votes Newest

Answers 2

Thank you, I will give it a try. I debugged the code to investigate the cause of the failure. It appears that the code fails at line 1312 inside Dataset.create during the execution of the instance._serialize() function. I will further explore the code to identify the precise point of failure.

  				
Posted 
	one year ago

					More  		
  Report
		
					HurtAnt92
				
					0
					 × 1

Hi HurtAnt92 ! Yes, you can create intermediate datasets. Just batch your datasets, for each batch create new child datasets, then create a dataset that has as parents all of these resulting children.
I'm surprized you get OOM tho, we don't load the files in memory, just the name/path of the files + size, hash etc. Could there be some other factor that causes this issue?

  				
Posted 
	one year ago

					More  		
  Report
		
					SmugDolphin23
				
					0

Write your answer

1K Views

2 Answers

one year ago