Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone, Now I Am Evaluating Clearml. I Have A Question About How To Handle Datasets. Does Clearml Provide Any Function To Manage Datasets? Or Do We Need To Manage Them By Ourselves? In Our Usecase, We Update Datasets Little By Little Over Days Or W

Hi everyone, now I am evaluating clearml.

I have a question about how to handle datasets.
Does clearml provide any function to manage datasets?
Or do we need to manage them by ourselves?

In our usecase, we update datasets little by little over days or weeks, and run experiments against updated datasets accordingly.
Thus, combinations of samples, datasets and experiments easily get numerous.
We would like to make sure which samples are held by which dataset and which dataset is consumed by which experiment.

If clearml has anything to make this kind of thing easy, that would be nice.
Currently, our data are stored in s3.

  
  
Posted 3 years ago
Votes Newest

Answers 13


Oh, thanks 🙂

  
  
Posted 3 years ago

Hi JitteryCoyote63 ,
Oh, you have somethings, Nice!
I will look into that document, thanks!

  
  
Posted 3 years ago

There is an example in the https://github.com/allegroai/clearml/blob/master/docs/datasets.md#workflow section of the linked I shared above

  
  
Posted 3 years ago

Is there any example of how to use clearml-data ?

  
  
Posted 3 years ago

JitteryCoyote63 Is there an example of how the learning pipeline can be triggered (started) by changes in dataset?

  
  
Posted 3 years ago

This is no coincidence - Any data versioning tool you will find are somehow close to how git works (dvc, etc.) since they aim to solve a similar problem. In the end, datasets are just files.
Where clearml-data stands out imo is the straightfoward CLI combined with the Pythonic API that allows you to register/retrieve datasets very easily

  
  
Posted 3 years ago

Hm, clearml-data looks very much like git.

  
  
Posted 3 years ago

BattyLion34 the closest I can think of the is monitoring class that can easily be extended.
Datasets are a type of Task, so we can monitor a project and trigger an action when we see a change in number of Tasks/Datasets that are completed.
Monitoring class:
https://github.com/allegroai/clearml/blob/master/clearml/automation/monitor.py
Monitoring example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py

I think a dataset monitoring example will be quite cool.

  
  
Posted 3 years ago

(I am not part of the awesome ClearML team, just a happy user 🙂 )

  
  
Posted 3 years ago

Yeah, as I have known that, now the CLI looks much more familiar to me.

  
  
Posted 3 years ago

Is it handling data just in a form of regular files?

  
  
Posted 3 years ago

I will let the team answer you on that one 🙂

  
  
Posted 3 years ago