If I Have A Task And A Dataset Is Being Created In A Task, How Can I Get A “Link” That This Dataset Is Created In This Task, Similar To How Model Has The Task Where It Came From

Answered

If I have a task and a dataset is being created in a task, how can I get a “link” that this Dataset is created in this task, similar to how Model has the task where it came from

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Votes Newest

Answers 17

Initially thought use_current_task in Dataset.create does it, but that seems to set the DataprocessingTask itself to the current task

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Basically you create the Task and make sure the "Dataset" is attached to it:
task = Task.init(...) dataset = Dataset.create(task=task) dataset.add_files(...)This will make sure the code is attached to the Dataset

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

But I don’t see a task option in Dataset.create

https://github.com/allegroai/clearml/blob/master/clearml/datasets/dataset.py#L657-L663

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663
task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

But it seems to make the current task the data processing task. I don't want it to take over the task.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

So you want to have two Tasks and connect the two ?
Maybe the best approach is to have th current_task. the parent of the Dataset Task ?
dataset._task.set_parent(Task.current_task())

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Maybe we should do that automatically ? wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yeah that would be good

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Will try set_parent, it wasn't in the docs and wasn't sure

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Doesn’t set the parent though 🤔

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Seems like passing the Task object is not working as expected (I'll make sure it is fixed).
Try:
dataset._task.set_parent(Task.current_task().id)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

id works, thanks 👍

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

👍

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This also helped me 🙂 Really, I'd like it both ways, such that the Task links to the Dataset it created, as well as the Dataset to the Task it was created by.
Right now I'm doing
dataset = Dataset.create(...) task.connect({'dataset_id': dataset.id}, name='Datasets')for the second direction. Is there a better way to do this? (I'm using it to pass Datasets between Tasks, one Task operating on a Dataset that was created by another Task.) Thank you!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SucculentBeetle7
				
					0
					 × 1

Regrading the first direction, this was just pushed 🙂
https://github.com/allegroai/clearml/commit/597a7ed05e2376ec48604465cf5ebd752cebae9c

Regrading the opposite direction:
That is a good question, I really like the idea of just adding another section named Datasets
SucculentBeetle7 should we do that automatically?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Nice!
I can't really think of a reason why not to do it automatically, at least for my usecase. What name would you give the dataset(s) in the Configuration? Also, the IDs as an entry in the Configuration will not be clickable in the web interface, right?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SucculentBeetle7
				
					0
					 × 1

Also, the IDs as an entry in the Configuration will not be clickable in the web interface, right?

No, but on the other hand, it will be editable if you clone the Task.
Which brings me to a different scenario,
In the original one, the Main Task created the Dataset, i.e. Output Dataset (and stored it both ways).
I could think of a situation the Task is using the Dataset as input (say preprocessing or traing), then we might want to enable users to clone and change the Input dataset. wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

17 Answers

3 years ago

one year ago