Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Wanted To Ask, I'M Versioning My Data Using Clearml Data. And I'Ll Have A Training Task With Clearml Task. My Question Is, Does Clearml Keep Track Of The Data Versions Fetched From Clearml Data? Basically I Want To See How Much Of Tracking And Informati

I wanted to ask, I'm versioning my data using ClearML Data. And I'll have a training task with ClearML Task.
My question is, does ClearML keep track of the Data Versions fetched from ClearML Data?
Basically I want to see how much of tracking and information storing is done by ClearML Directly and how much will I have to manually do using a database

  
  
Posted 2 years ago
Votes Newest

Answers 11


Let me try to be a bit more clear.

If I have a training task in which I'm getting multiple ClearML Datasets from multiple ClearML IDs. I get local copies, train the model, save the model, and delete the local copy in that script.

Does ClearML keep track of which data versions were gotten and used from ClearML Data?

  
  
Posted 2 years ago

Basically trying to keep track of how much of the tracking and record keeping is done by ClearML for me? And what things do I need to keep a track of manually in a database.

  
  
Posted 2 years ago

That is true. If I'm understanding correctly, by configuration parameters, you mean using arg parse right?

  
  
Posted 2 years ago

VexedCat68 , do you mean does it track which version was fetched or does it track everytime a version is fetched?

  
  
Posted 2 years ago

I'm not sure about auto logging, since you might be using different datasets or you might get a dataset but might not use it based on specific conditions. However as a developer choosing to use such as ClearML who considers it more of an ecosystem instead of just a continuous training pipeline, I would want as many aspects of the MLOPS process and the information around the experiment to be able to be logged within the bounds of ClearML without having to use external databases or libraries.

  
  
Posted 2 years ago

Yes, I was referring to logging the "clearlm-data" Dataset ID on the Task itself, not an external database.
Make sense?

  
  
Posted 2 years ago

Understood

  
  
Posted 2 years ago

VexedCat68 , correct. But not only arg parse. The entire configuration section 🙂

  
  
Posted 2 years ago

It does to me. However I'm proposing a situation where a user gets N number of Datasets using Dataset.get, but uses m number of datasets for training where m < n. Would it make sense to only log the m datasets that were used for training? How would that be done?

  
  
Posted 2 years ago

VexedCat68 actually a few users already suggested we auto log the dataset ID used as an additional configuration section, wdyt?

  
  
Posted 2 years ago

VexedCat68 , that's a good question! I'm not sure that ClearML keeps track of that, I need to check on that.

However, I think a neat solution could be using the datasets as task configuration parameters. This way you can track which datasets were used and you can set up new runs with different datasets.

  
  
Posted 2 years ago
613 Views
11 Answers
2 years ago
one year ago
Tags