Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Guys, I Have Data ( For Folder Tree Image Classification) And We Stored That On S3 Bucket Minio Can I Register Dataset To Clearml Without Downloading (Using Storage Manager) Data On S3 Bucket To My Local Then Upload To Clearml?

Hi guys,
i have data ( for folder tree image classification) and we stored that on s3 bucket minio
can i register dataset to clearML without downloading (using storage manager) data on s3 bucket to my local then upload to clearML?

  
  
Posted one year ago
Votes Newest

Answers 7


yes, so far i know, if we want to upload dataset on clearml, we need provide local_path to data, then clearml will upload to the platform.

my data not on local, but s3 bucket.
is there a way to point s3 url ? my currently workflow is download my data from s3 bucket to local, then upload to clearml.

  
  
Posted one year ago

still i need do this?

dataset.upload() dataset.finalize()if you want to finalize the dataset, yes

if we have uploaded data clearml, how we add data?
this is my way right now.

dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f" ", parent_datasets=[id_dataset_latest] )If you finalized it, you can create a child version - https://clear.ml/docs/latest/docs/clearml_data/data_management_examples/data_man_simple/#creating-a-child-dataset

  
  
Posted one year ago

still i need do this?
dataset.upload() dataset.finalize()
i have another question,
if we have uploaded data clearml, how we add data?
this is my way right now.

dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f" ", parent_datasets=[id_dataset_latest] )

  
  
Posted one year ago

Hi QuaintJellyfish58 ,

Not sure I’m getting it, can you describe your scenario? Are you referring to https://clear.ml/docs/latest/docs/clearml_data/clearml_data ?

  
  
Posted one year ago

Thanks, its works!

  
  
Posted one year ago

# downloading data from s3 manager = StorageManager() target_folder = manager.download_folder( local_folder='/tmp', remote_url=f' ` '
)

    # upload to clearml
    dataset = Dataset.create(
        dataset_project=metadata[2], 
        dataset_name=metadata[3],
        dataset_tags=tags,
        output_uri=" ` ` "
    )
    fp_target_folder = os.path.join(target_folder, minio_s3_url)
    print('>>> target_folder:', fp_target_folder)
    print('>>> target_folder:', os.listdir(fp_target_folder))

    create_histogram_classifier(fp_target_folder, metadata[4], dataset)
    create_3d_scatter(fp_target_folder, dataset)
    dataset.get_logger().report_table(series='Table', title='Table Information',table_plot=df_table)
    dataset.add_files(path=fp_target_folder)
    dataset.upload(show_progress=True, chunk_size=100, verbose=True)

    is_success = dataset.finalize() `
  
  
Posted one year ago

you can register the links only (no need to download and upload),
clearml-data add --linksfrom CLI, or add_external_files from code:

dataset.add_external_files(source_url=" ")

  
  
Posted one year ago
721 Views
7 Answers
one year ago
one year ago
Tags
Similar posts