Initially thought use_current_task in Dataset.create
does it, but that seems to set the DataprocessingTask itself to the current task
Basically you create the Task and make sure the "Dataset" is attached to it:task = Task.init(...) dataset = Dataset.create(task=task) dataset.add_files(...)
This will make sure the code is attached to the Dataset
But I donโt see a task option in Dataset.create
https://github.com/allegroai/clearml/blob/master/clearml/datasets/dataset.py#L657-L663
I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)
But it seems to make the current task the data processing task. I don't want it to take over the task.
So you want to have two Tasks and connect the two ?
Maybe the best approach is to have th current_task. the parent of the Dataset Task ?dataset._task.set_parent(Task.current_task())
Maybe we should do that automatically ? wdyt?
Will try set_parent, it wasn't in the docs and wasn't sure
Seems like passing the Task object is not working as expected (I'll make sure it is fixed).
Try:dataset._task.set_parent(Task.current_task().id)
This also helped me ๐ Really, I'd like it both ways, such that the Task links to the Dataset it created, as well as the Dataset to the Task it was created by.
Right now I'm doingdataset = Dataset.create(...) task.connect({'dataset_id': dataset.id}, name='Datasets')
for the second direction. Is there a better way to do this? (I'm using it to pass Datasets between Tasks, one Task operating on a Dataset that was created by another Task.) Thank you!
Regrading the first direction, this was just pushed ๐
https://github.com/allegroai/clearml/commit/597a7ed05e2376ec48604465cf5ebd752cebae9c
Regrading the opposite direction:
That is a good question, I really like the idea of just adding another section named Datasets
SucculentBeetle7 should we do that automatically?
Nice!
I can't really think of a reason why not to do it automatically, at least for my usecase. What name would you give the dataset(s) in the Configuration? Also, the IDs as an entry in the Configuration will not be clickable in the web interface, right?
Also, the IDs as an entry in the Configuration will not be clickable in the web interface, right?
No, but on the other hand, it will be editable if you clone the Task.
Which brings me to a different scenario,
In the original one, the Main Task created the Dataset, i.e. Output Dataset (and stored it both ways).
I could think of a situation the Task is using the Dataset as input (say preprocessing or traing), then we might want to enable users to clone and change the Input dataset. wdyt?