Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Can I Ask How I Can Make Clearml-Datasets In Comparison With Pytorch Datasets/Dataloader? In Particular, Pytorch Dataloaders Would Be Able To Batch Pull And Then Preprocess Data Using Multi-Cpus, Feed It Into The Training Loop And Achieve As High Util

Hi, can i ask how i can make Clearml-Datasets in comparison with PyTorch datasets/dataloader? In particular, pytorch dataloaders would be able to batch pull and then preprocess data using multi-cpus, feed it into the training loop and achieve as high utilisation of cpu and gpu at the same time. In the case of using clearml-datasets, what is the best practice of achieving this? Either with pytorch or anything that's built into ClearML?

  
  
Posted 2 years ago
Votes Newest

Answers 5


Hi SubstantialElk6 ,

That's an interesting idea. I think if you want to preprocess a lot of data I think the best would be using multiple datasets (each per process) or different versions of datasets. Although I think you can also pull specific chunks of dataset and then you can use just the one - I'm not sure about the last point.

What do you think?

  
  
Posted 2 years ago

Although I think you can also pull specific chunks of dataset

How do you do that with clearml-data?

  
  
Posted 2 years ago

SubstantialElk6 , I think this is what you're looking for:
https://clear.ml/docs/latest/docs/references/sdk/dataset#get_local_copy
Dataset.get_local_copy(..., part=X)

  
  
Posted 2 years ago

Thanks CostlyOstrich36 , how do i know how is the parts indexed in the first place? Or rather, how is chunk and parts defined? Say in the context of images, videos, text documents...etc.

  
  
Posted 2 years ago

https://clear.ml/docs/latest/docs/references/sdk/dataset/#get_num_chunks
I think this might also be helpful. Gloss over the functions available in the documentation, I think you might find what you're looking for 🙂

  
  
Posted 2 years ago
1K Views
5 Answers
2 years ago
one year ago
Tags