That is awesome!
If you feel like writing a bit about the use-case and how you solved it, I think AnxiousSeal95 will be more than happy to publish something like that 🙂
Yes, for datasets where we need GDPR compliance
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?
I am essentially creating a EphemeralDataset abstraction and creating controlled lifecycle for it such that the data is removed after a day in experiments. Additionally and optionally, data created during a step in a pipeline can be cleared once the pipeline completes
Is task.parent something that could help?
Exactly 🙂 something like:# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)
AgitatedDove14 is it possible to get the pipeline task running a step in a step? Is task.parent something that could help?
AgitatedDove14 - mean this - says name=None but text says default is General.
Ok the doc needs fix (edited)
suggestion?
The description says this though
A section name associated with the connected object. Default: 'General'
So General would have created a General instead of Args?
yes,
This is a must, you have to specify the hyperparameters section you are referencing.
https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/examples/pipeline/step2_data_processing.py#L21
This is actually:task.connect(args, name='General')
Basically there is no "random_state" only "General/random_state"
Make sense ?
What’s the point of saying General?
So General would have created a General instead of Args?
See Args section in the screenshot
"Args/counter"
Think multiple hyper-paremter sections that we need to reference
(under the Tasks Configuration Tab, the Hyper parameters can have multiple sections)
"General" is the parameter section name (like Args)
In params:
parameter_override={'General/dataset_url
What’s the General
for?
Notice the pipeline step/Task at execution is not aware of the pipeline context
Sure, you can pass ${stage_data.id}
as argument and the actual Task will get the reference step's Task ID of the current execution.
make sense ?
barebones, can a step in a pipeline refer to a previous step by name and get it?
On a related note - is it possible to get things like ${stage_data.artifacts.dataset.url}
from within a task rather than passing params in add_step
?