Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hi, I Have A Question About Clearml-Data. Clearml-Data Probably Does Well On Data Versioning, But When It Comes To Actual Loading Of Data, Are There Examples Of How It Can Make Use Of Advanced Features Such That Those In


Like AnxiousSeal95 says, clearml server will version a dataset for you and push it to a unified storage place, as well as make it differenceable.

I’ve written a workshop on how to train image classifiers for the problem of bird species identification and recently I’ve adapted it to work with clearml.

There is an example workbook on how to upload a dataset to clearml server, in this a directory of images. See here: https://github.com/ecm200/caltech_birds/blob/master/notebooks/clearml_add_new_dataset.ipynb

On the training script side, you need to make a local copy of the dataset before training. If you keep the same directory for cached datasets then clearml will check to see if the dataset version has changed, and if not it will used an already cached version. If it has, or it doesn’t exist, it will automatically download it. This is achieved as follows:

` # Get the dataset from the clearml-server and cache locally.
print('[INFO] Getting a local copy of the CUB200 birds datasets')

Train

train_dataset = Dataset.get(dataset_project='Caltech Birds', dataset_name='cub200_2011_train_dataset__AZURE_BLOB_VERSION')
print('[INFO] Default location of training dataset:: {}'.format(train_dataset.get_default_storage())
train_dataset_base = train_dataset.get_local_copy()
print('[INFO] Default location of training dataset:: {}'.format(train_dataset_base)) `
This code snippet will get the dataset cached locally.

The other thing you need to do then is to get the cached dataset locations before executing model training.
You can find the example in this training script which sets up a PyTorch Ingite training job on the clearml server. This can then be executed on remote compute by clearml-agents via the server queue, and the script will cache the dataset locally and then get the cached dataset locations, overriding the default local locations.

See here: https://github.com/ecm200/caltech_birds/blob/master/scripts/train_clearml_pytorch_ignite_caltech_birds.py

  
  
Posted 3 years ago
165 Views
0 Answers
3 years ago
one year ago