Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
If I Have A Task And A Dataset Is Being Created In A Task, How Can I Get A “Link” That This Dataset Is Created In This Task, Similar To How Model Has The Task Where It Came From

If I have a task and a dataset is being created in a task, how can I get a โ€œlinkโ€ that this Dataset is created in this task, similar to how Model has the task where it came from

  
  
Posted 2 years ago
Votes Newest

Answers 17


Basically you create the Task and make sure the "Dataset" is attached to it:
task = Task.init(...) dataset = Dataset.create(task=task) dataset.add_files(...)This will make sure the code is attached to the Dataset

  
  
Posted 2 years ago

So you want to have two Tasks and connect the two ?
Maybe the best approach is to have th current_task. the parent of the Dataset Task ?
dataset._task.set_parent(Task.current_task())

  
  
Posted 2 years ago

Yeah that would be good

  
  
Posted 2 years ago

id works, thanks ๐Ÿ‘

  
  
Posted 2 years ago

Also, the IDs as an entry in the Configuration will not be clickable in the web interface, right?

No, but on the other hand, it will be editable if you clone the Task.
Which brings me to a different scenario,
In the original one, the Main Task created the Dataset, i.e. Output Dataset (and stored it both ways).
I could think of a situation the Task is using the Dataset as input (say preprocessing or traing), then we might want to enable users to clone and change the Input dataset. wdyt?

  
  
Posted 2 years ago

But it seems to make the current task the data processing task. I don't want it to take over the task.

  
  
Posted 2 years ago

I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663
task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)

  
  
Posted 2 years ago

This also helped me ๐Ÿ™‚ Really, I'd like it both ways, such that the Task links to the Dataset it created, as well as the Dataset to the Task it was created by.
Right now I'm doing
dataset = Dataset.create(...) task.connect({'dataset_id': dataset.id}, name='Datasets')for the second direction. Is there a better way to do this? (I'm using it to pass Datasets between Tasks, one Task operating on a Dataset that was created by another Task.) Thank you!

  
  
Posted 2 years ago

๐Ÿ‘

  
  
Posted 2 years ago

Initially thought use_current_task in Dataset.create does it, but that seems to set the DataprocessingTask itself to the current task

  
  
Posted 2 years ago

Doesnโ€™t set the parent though ๐Ÿค”

  
  
Posted 2 years ago

Maybe we should do that automatically ? wdyt?

  
  
Posted 2 years ago

Will try set_parent, it wasn't in the docs and wasn't sure

  
  
Posted 2 years ago

Nice!
I can't really think of a reason why not to do it automatically, at least for my usecase. What name would you give the dataset(s) in the Configuration? Also, the IDs as an entry in the Configuration will not be clickable in the web interface, right?

  
  
Posted 2 years ago

But I donโ€™t see a task option in Dataset.create

https://github.com/allegroai/clearml/blob/master/clearml/datasets/dataset.py#L657-L663

  
  
Posted 2 years ago

Regrading the first direction, this was just pushed ๐Ÿ™‚
https://github.com/allegroai/clearml/commit/597a7ed05e2376ec48604465cf5ebd752cebae9c

Regrading the opposite direction:
That is a good question, I really like the idea of just adding another section named Datasets
SucculentBeetle7 should we do that automatically?

  
  
Posted 2 years ago

Seems like passing the Task object is not working as expected (I'll make sure it is fixed).
Try:
dataset._task.set_parent(Task.current_task().id)

  
  
Posted 2 years ago
369 Views
17 Answers
2 years ago
11 months ago
Tags
Similar posts