Hi Guys, I Have Data ( For Folder Tree Image Classification) And We Stored That On S3 Bucket Minio Can I Register Dataset To Clearml Without Downloading (Using Storage Manager) Data On S3 Bucket To My Local Then Upload To Clearml?

Answered

Hi guys,
i have data ( for folder tree image classification) and we stored that on s3 bucket minio
can i register dataset to clearML without downloading (using storage manager) data on s3 bucket to my local then upload to clearML?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					QuaintJellyfish58
				
					0
					 × 1

Votes Newest

Answers 7

still i need do this?

dataset.upload() dataset.finalize()if you want to finalize the dataset, yes

if we have uploaded data clearml, how we add data?
this is my way right now.

dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f" ", parent_datasets=[id_dataset_latest] )If you finalized it, you can create a child version - https://clear.ml/docs/latest/docs/clearml_data/data_management_examples/data_man_simple/#creating-a-child-dataset

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

still i need do this?
dataset.upload() dataset.finalize()
i have another question,
if we have uploaded data clearml, how we add data?
this is my way right now.

dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f" ", parent_datasets=[id_dataset_latest] )

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					QuaintJellyfish58
				
					0
					 × 1

yes, so far i know, if we want to upload dataset on clearml, we need provide local_path to data, then clearml will upload to the platform.

my data not on local, but s3 bucket.
is there a way to point s3 url ? my currently workflow is download my data from s3 bucket to local, then upload to clearml.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					QuaintJellyfish58
				
					0
					 × 1

you can register the links only (no need to download and upload),
clearml-data add --linksfrom CLI, or add_external_files from code:

dataset.add_external_files(source_url=" ")

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

# downloading data from s3 manager = StorageManager() target_folder = manager.download_folder( local_folder='/tmp', remote_url=f' ` '
)

    # upload to clearml
    dataset = Dataset.create(
        dataset_project=metadata[2], 
        dataset_name=metadata[3],
        dataset_tags=tags,
        output_uri=" ` ` "
    )
    fp_target_folder = os.path.join(target_folder, minio_s3_url)
    print('>>> target_folder:', fp_target_folder)
    print('>>> target_folder:', os.listdir(fp_target_folder))

    create_histogram_classifier(fp_target_folder, metadata[4], dataset)
    create_3d_scatter(fp_target_folder, dataset)
    dataset.get_logger().report_table(series='Table', title='Table Information',table_plot=df_table)
    dataset.add_files(path=fp_target_folder)
    dataset.upload(show_progress=True, chunk_size=100, verbose=True)

    is_success = dataset.finalize() `

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					QuaintJellyfish58
				
					0
					 × 1

Thanks, its works!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					QuaintJellyfish58
				
					0
					 × 1

Hi QuaintJellyfish58 ,

Not sure I’m getting it, can you describe your scenario? Are you referring to https://clear.ml/docs/latest/docs/clearml_data/clearml_data ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Write your answer

2K Views

7 Answers

2 years ago