That is awesome!
If you feel like writing a bit about the use-case and how you solved it, I think AnxiousSeal95 will be more than happy to publish something like that 🙂
So General would have created a General instead of Args?
yes,
This is a must, you have to specify the hyperparameters section you are referencing.
https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/examples/pipeline/step2_data_processing.py#L21
This is actually:task.connect(args, name='General')Basically there is no "random_state" only "General/random_state"
Make sense ?
AgitatedDove14 - mean this - says name=None but text says default is General.
I am essentially creating a EphemeralDataset abstraction and creating controlled lifecycle for it such that the data is removed after a day in experiments. Additionally and optionally, data created during a step in a pipeline can be cleared once the pipeline completes
Ok the doc needs fix (edited)
suggestion?
Yes, for datasets where we need GDPR compliance
AgitatedDove14 is it possible to get the pipeline task running a step in a step? Is task.parent something that could help?
So General would have created a General instead of Args?
Is task.parent something that could help?
Exactly 🙂 something like:# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)
Sure, you can pass ${stage_data.id} as argument and the actual Task will get the reference step's Task ID of the current execution.
make sense ?
What’s the point of saying General?
Notice the pipeline step/Task at execution is not aware of the pipeline context
In params:
parameter_override={'General/dataset_urlWhat’s the General for?
Think multiple hyper-paremter sections that we need to reference
(under the Tasks Configuration Tab, the Hyper parameters can have multiple sections)
barebones, can a step in a pipeline refer to a previous step by name and get it?
On a related note - is it possible to get things like ${stage_data.artifacts.dataset.url} from within a task rather than passing params in add_step ?
"General" is the parameter section name (like Args)
See Args section in the screenshot
"Args/counter"
The description says this though
A section name associated with the connected object. Default: 'General'
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?