yes, so far i know, if we want to upload dataset on clearml, we need provide local_path to data, then clearml will upload to the platform.
my data not on local, but s3 bucket.
is there a way to point s3 url ? my currently workflow is download my data from s3 bucket to local, then upload to clearml.
still i need do this?
dataset.upload() dataset.finalize()
if you want to finalize the dataset, yes
if we have uploaded data clearml, how we add data?
this is my way right now.
dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f"
", parent_datasets=[id_dataset_latest] )
If you finalized it, you can create a child version - https://clear.ml/docs/latest/docs/clearml_data/data_management_examples/data_man_simple/#creating-a-child-dataset
still i need do this?dataset.upload() dataset.finalize()
i have another question,
if we have uploaded data clearml, how we add data?
this is my way right now.
dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f"
", parent_datasets=[id_dataset_latest] )
Hi QuaintJellyfish58 ,
Not sure I’m getting it, can you describe your scenario? Are you referring to https://clear.ml/docs/latest/docs/clearml_data/clearml_data ?
# downloading data from s3 manager = StorageManager() target_folder = manager.download_folder( local_folder='/tmp', remote_url=f'
` '
)
# upload to clearml
dataset = Dataset.create(
dataset_project=metadata[2],
dataset_name=metadata[3],
dataset_tags=tags,
output_uri=" ` ` "
)
fp_target_folder = os.path.join(target_folder, minio_s3_url)
print('>>> target_folder:', fp_target_folder)
print('>>> target_folder:', os.listdir(fp_target_folder))
create_histogram_classifier(fp_target_folder, metadata[4], dataset)
create_3d_scatter(fp_target_folder, dataset)
dataset.get_logger().report_table(series='Table', title='Table Information',table_plot=df_table)
dataset.add_files(path=fp_target_folder)
dataset.upload(show_progress=True, chunk_size=100, verbose=True)
is_success = dataset.finalize() `
you can register the links only (no need to download and upload),clearml-data add --links
from CLI, or add_external_files
from code:
dataset.add_external_files(source_url="
")