Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hi, Can I Ask How I Can Make Clearml-Datasets In Comparison With Pytorch Datasets/Dataloader? In Particular, Pytorch Dataloaders Would Be Able To Batch Pull And Then Preprocess Data Using Multi-Cpus, Feed It Into The Training Loop And Achieve As High Util

Hi, can i ask how i can make Clearml-Datasets in comparison with PyTorch datasets/dataloader? In particular, pytorch dataloaders would be able to batch pull and then preprocess data using multi-cpus, feed it into the training loop and achieve as high utilisation of cpu and gpu at the same time. In the case of using clearml-datasets, what is the best practice of achieving this? Either with pytorch or anything that's built into ClearML?

Posted one year ago
Votes Newest

Answers 5

SubstantialElk6 , I think this is what you're looking for:
Dataset.get_local_copy(..., part=X)

Posted one year ago

I think this might also be helpful. Gloss over the functions available in the documentation, I think you might find what you're looking for 🙂

Posted one year ago

Although I think you can also pull specific chunks of dataset

How do you do that with clearml-data?

Posted one year ago

Thanks CostlyOstrich36 , how do i know how is the parts indexed in the first place? Or rather, how is chunk and parts defined? Say in the context of images, videos, text documents...etc.

Posted one year ago

Hi SubstantialElk6 ,

That's an interesting idea. I think if you want to preprocess a lot of data I think the best would be using multiple datasets (each per process) or different versions of datasets. Although I think you can also pull specific chunks of dataset and then you can use just the one - I'm not sure about the last point.

What do you think?

Posted one year ago
5 Answers
one year ago
8 months ago