Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hello! Have Any One Tried To Convert Large Dataset To Clear Ml Format ? I Am Tryting To Convert 350Gb Dataset With ~42 Million Files And Getting

Hello! Have any one tried to convert large dataset to Clear ML format ? I am tryting to convert 350GB dataset with ~42 million files and getting Process failed, exit code -9z . Are there any hints how to work with large datasets ?

Posted 9 months ago
Votes Newest

Answers 5

Hi @<1523702307240284160:profile|TeenyBeetle18> , how are you converting it, what code are you using exactly?

Posted 9 months ago

Have you ever benchmarked clear ml datasets on large datasets ? How good is it on handling them ?

Posted 9 months ago

It seems to like the process exits on a memory error

Posted 9 months ago

Nothing special

    dataset = Dataset.create(dataset_name = 'my_dataset', parent_datasets=None, use_current_task=False)
    dataset.add_files(dataset_dir, verbose=False)
Posted 9 months ago

@<1523701087100473344:profile|SuccessfulKoala55> any hints ?

Posted 9 months ago
5 Answers
9 months ago
8 months ago