Hi, Is There A Way To Wait Until A Dataset Finish Uploading Before Proceed? Because I Want To Upload Dataset If It Is Not Already Exist And Then Process The Dataset

Answered

Hi, is there a way to wait until a dataset finish uploading before proceed? because I want to upload dataset if it is not already exist and then process the dataset

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					ScrawnyCrocodile51
				
					0
					 × 1

Votes Newest

Answers 5

so "add_external_files" means the files are not actually uploaded, they are "registered" as external links. This means the upload is basically doing nothing , because there is nothing to upload

Where exactly are you getting the error?

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Roughly I am trying to do this:

def upload_clearml_dataset_from_external_source(
    source_url,
    dataset_name: str,
    dataset_project: str,
):
    # reference:


    dataset = Dataset.create(dataset_name=dataset_name, dataset_project=dataset_project)
    dataset.add_external_files(source_url=source_url)
    dataset.upload()

    dataset.finalize()


upload_clearml_dataset_from_external_source("

", name, project)

Dataset.get(dataset_project=project, dataset_name=name).get_mutable_local_copy()

# load part of dataset and do something to it

Do you see any potential issue with this?

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					ScrawnyCrocodile51
				
					0
					 × 1

I used these setup to load a pretty big dataset from s3:

dataset.add_external_files(
            source_url
        )
        dataset.upload(
            verbose=verbose
        )
        dataset.finalize()

But then seeing error complain about dataset doesn't exist. So my best guess is that the uploading is still happening in the background while the code has move forward to try to do something with that dataset.

So I am questioning if I have to explicitly add some logic to wait for upload finish

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					ScrawnyCrocodile51
				
					0
					 × 1

This seems to be okay to me, are you seeing the dataset in the web UI?
Also:

my_local_dataset_folder = Dataset.get(dataset_project=project, dataset_name=name).get_mutable_local_copy()

what exactly are you seeing in " my_local_dataset_folder " directory?
(it should contain the copy of the S3 file)

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1797800418953138176:profile|ScrawnyCrocodile51>
The upload function returns Only after the files were uploaded:
None
Is this what you mean?

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

814 Views

5 Answers

9 months ago