This seems to be okay to me, are you seeing the dataset in the web UI?
Also:
my_local_dataset_folder = Dataset.get(dataset_project=project, dataset_name=name).get_mutable_local_copy()
what exactly are you seeing in " my_local_dataset_folder
" directory?
(it should contain the copy of the S3 file)
Hi ScrawnyCrocodile51
The upload function returns Only after the files were uploaded:
None
Is this what you mean?
I used these setup to load a pretty big dataset from s3:
dataset.add_external_files(
source_url
)
dataset.upload(
verbose=verbose
)
dataset.finalize()
But then seeing error complain about dataset doesn't exist. So my best guess is that the uploading is still happening in the background while the code has move forward to try to do something with that dataset.
So I am questioning if I have to explicitly add some logic to wait for upload finish
Roughly I am trying to do this:
def upload_clearml_dataset_from_external_source(
source_url,
dataset_name: str,
dataset_project: str,
):
# reference:
dataset = Dataset.create(dataset_name=dataset_name, dataset_project=dataset_project)
dataset.add_external_files(source_url=source_url)
dataset.upload()
dataset.finalize()
upload_clearml_dataset_from_external_source("
", name, project)
Dataset.get(dataset_project=project, dataset_name=name).get_mutable_local_copy()
# load part of dataset and do something to it
Do you see any potential issue with this?
so "add_external_files" means the files are not actually uploaded, they are "registered" as external links. This means the upload is basically doing nothing , because there is nothing to upload
Where exactly are you getting the error?