Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Two Simple Lineage Related Questions:

Two simple lineage related questions:
Task B is a clone of Taks A. Does B store the information that it was cloned from A somewhere? Training task X loads Dataset Y usingds = Dataset.get(dataset_id) ds.get_local_copy()Does http://clear.ml understand this as a dependency and track it as some sort of lineage?
Or do I need to report it somehow for the info to show up?

  
  
Posted 2 years ago
Votes Newest

Answers 14


RoughTiger69 , regarding the dataset loading, we are actually thinking of adding it as another "hyper parameter" section, and I think the idea came up a few times in the last month, so we should definitely do that. The question is how do we support multiple entries (i.e. two datasets loaded)? Should we force users to "name" the dataset when they "get it" ?

Regrading cloning, we had a lot of internal discussions on it, "Parent" is a field on a Task, so the information can be easily stored, the question is always, is a clone a child version of the parent? what happens of the parent has its own parent, are they siblings now? wdyt?

  
  
Posted 2 years ago

CostlyOstrich36 Lineage information for datasets - oversimplifying but bare with me:
Task should have a section called “input datasets”)
each time I do a Dataset.get() inside a current_task, add the dataset ID to this section

Same can work with InputModel()

This way you can have a full lineage graph (also queryable/visualizable)

  
  
Posted 2 years ago

You’ll just need the user to 

name them

 as part of loading them in the code (in case they are loading multiple datasets/models).

Exactly! (and yes UI visualization is coming 🙂 )

  
  
Posted 2 years ago

👍

  
  
Posted 2 years ago

Hi RoughTiger69
I like the direction this is taking, let me add some more complexity.
My thinking is that if we have “input datasets”, I'd also like to be able to clone the Task and automagically change them (with the need to export the dataset_id as an argument), basically I'm thinking :
train = Datasset.get('aabbcc1', name='train') valid = Datasset.get('aabbcc2', name='validation') custom = Datasset.get('aabbcc3', name='custom')Then you end up with HyperParameter Section: "Input Datasets”:
train: aabbcc1
validation: aabbcc2
custom: aabbcc3
And then you can clone the Task in the UI, and edit the dataset ID and relaunch it, when now (without changing the code) you are changing the dataset your code is using.
wdyt?

  
  
Posted 2 years ago

so I think it will just be confusing

  
  
Posted 2 years ago

Re. “which task did I clone from” - to my understanding “parent’ field is used for “runtime parent” - i.e. what task started me.
This is not the same as “which task was I cloned from”

  
  
Posted 2 years ago

Sure, but was wondering if it has more of a “first class citizen” status for tracking… e.g. something you can visualize in the UI or query via API

  
  
Posted 2 years ago

RoughTiger69 So basically (If I follow your example), the question is whether ClearML "knows" Task B" is a clone of "Task A"?
And if the loaded Dataset Y, is somehow registered on Task X?
Is that correct?

  
  
Posted 2 years ago

I mean, if it’s not tracked, I think it would be a good feature!

  
  
Posted 2 years ago

yep

  
  
Posted 2 years ago

Task B is a clone of Taks A. Does B store the information that it was cloned from A somewhere?

You can add any user properties you like to any task, so maybe “origin” : <task_id> will do the work?

  
  
Posted 2 years ago

RoughTiger69 thanks for the input 🙂

  
  
Posted 2 years ago

I think that in principal, if you “intercept” the calls to Model.get() or Dataset.get() from within a task, you can collect the ID’s and do various stuff with them. You can store and visualize it for lineage, or expose it as another hyper parameter I suppose.

You’ll just need the user to name them as part of loading them in the code (in case they are loading multiple datasets/models).

  
  
Posted 2 years ago
580 Views
14 Answers
2 years ago
one year ago
Tags