Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Is There A Way To Wait Until A Dataset Finish Uploading Before Proceed? Because I Want To Upload Dataset If It Is Not Already Exist And Then Process The Dataset

Hi, is there a way to wait until a dataset finish uploading before proceed? because I want to upload dataset if it is not already exist and then process the dataset

  
  
Posted one month ago
Votes Newest

Answers 5


This seems to be okay to me, are you seeing the dataset in the web UI?
Also:

my_local_dataset_folder = Dataset.get(dataset_project=project, dataset_name=name).get_mutable_local_copy()

what exactly are you seeing in " my_local_dataset_folder " directory?
(it should contain the copy of the S3 file)

  
  
Posted one month ago

Hi ScrawnyCrocodile51
The upload function returns Only after the files were uploaded:
None
Is this what you mean?

  
  
Posted one month ago

I used these setup to load a pretty big dataset from s3:

dataset.add_external_files(
            source_url
        )
        dataset.upload(
            verbose=verbose
        )
        dataset.finalize()

But then seeing error complain about dataset doesn't exist. So my best guess is that the uploading is still happening in the background while the code has move forward to try to do something with that dataset.

So I am questioning if I have to explicitly add some logic to wait for upload finish

  
  
Posted one month ago

Roughly I am trying to do this:

def upload_clearml_dataset_from_external_source(
    source_url,
    dataset_name: str,
    dataset_project: str,
):
    # reference: 

    dataset = Dataset.create(dataset_name=dataset_name, dataset_project=dataset_project)
    dataset.add_external_files(source_url=source_url)
    dataset.upload()

    dataset.finalize()


upload_clearml_dataset_from_external_source("
", name, project)

Dataset.get(dataset_project=project, dataset_name=name).get_mutable_local_copy()

# load part of dataset and do something to it

Do you see any potential issue with this?

  
  
Posted one month ago

so "add_external_files" means the files are not actually uploaded, they are "registered" as external links. This means the upload is basically doing nothing , because there is nothing to upload

Where exactly are you getting the error?

  
  
Posted one month ago
153 Views
5 Answers
one month ago
one month ago
Tags