Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Guys, I Have Data ( For Folder Tree Image Classification) And We Stored That On S3 Bucket Minio Can I Register Dataset To Clearml Without Downloading (Using Storage Manager) Data On S3 Bucket To My Local Then Upload To Clearml?

Hi guys,
i have data ( for folder tree image classification) and we stored that on s3 bucket minio
can i register dataset to clearML without downloading (using storage manager) data on s3 bucket to my local then upload to clearML?

  
  
Posted one year ago
Votes Newest

Answers 7


still i need do this?

dataset.upload() dataset.finalize()if you want to finalize the dataset, yes

if we have uploaded data clearml, how we add data?
this is my way right now.

dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f" ", parent_datasets=[id_dataset_latest] )If you finalized it, you can create a child version - https://clear.ml/docs/latest/docs/clearml_data/data_management_examples/data_man_simple/#creating-a-child-dataset

  
  
Posted one year ago

yes, so far i know, if we want to upload dataset on clearml, we need provide local_path to data, then clearml will upload to the platform.

my data not on local, but s3 bucket.
is there a way to point s3 url ? my currently workflow is download my data from s3 bucket to local, then upload to clearml.

  
  
Posted one year ago

you can register the links only (no need to download and upload),
clearml-data add --linksfrom CLI, or add_external_files from code:

dataset.add_external_files(source_url=" ")

  
  
Posted one year ago

# downloading data from s3 manager = StorageManager() target_folder = manager.download_folder( local_folder='/tmp', remote_url=f' ` '
)

    # upload to clearml
    dataset = Dataset.create(
        dataset_project=metadata[2], 
        dataset_name=metadata[3],
        dataset_tags=tags,
        output_uri=" ` ` "
    )
    fp_target_folder = os.path.join(target_folder, minio_s3_url)
    print('>>> target_folder:', fp_target_folder)
    print('>>> target_folder:', os.listdir(fp_target_folder))

    create_histogram_classifier(fp_target_folder, metadata[4], dataset)
    create_3d_scatter(fp_target_folder, dataset)
    dataset.get_logger().report_table(series='Table', title='Table Information',table_plot=df_table)
    dataset.add_files(path=fp_target_folder)
    dataset.upload(show_progress=True, chunk_size=100, verbose=True)

    is_success = dataset.finalize() `
  
  
Posted one year ago

Thanks, its works!

  
  
Posted one year ago

still i need do this?
dataset.upload() dataset.finalize()
i have another question,
if we have uploaded data clearml, how we add data?
this is my way right now.

dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f" ", parent_datasets=[id_dataset_latest] )

  
  
Posted one year ago

Hi QuaintJellyfish58 ,

Not sure I’m getting it, can you describe your scenario? Are you referring to https://clear.ml/docs/latest/docs/clearml_data/clearml_data ?

  
  
Posted one year ago
962 Views
7 Answers
one year ago
one year ago
Tags
Similar posts