On a related note - is it possible to get things like ${stage_data.artifacts.dataset.url}
from within a task rather than passing params in add_step
?
barebones, can a step in a pipeline refer to a previous step by name and get it?
Sure, you can pass ${stage_data.id}
as argument and the actual Task will get the reference step's Task ID of the current execution.
make sense ?
Notice the pipeline step/Task at execution is not aware of the pipeline context
In params:
parameter_override={'General/dataset_url
What’s the General
for?
"General" is the parameter section name (like Args)
Think multiple hyper-paremter sections that we need to reference
(under the Tasks Configuration Tab, the Hyper parameters can have multiple sections)
See Args section in the screenshot
"Args/counter"
So General would have created a General instead of Args?
What’s the point of saying General?
So General would have created a General instead of Args?
yes,
This is a must, you have to specify the hyperparameters section you are referencing.
https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/examples/pipeline/step2_data_processing.py#L21
This is actually:task.connect(args, name='General')
Basically there is no "random_state" only "General/random_state"
Make sense ?
The description says this though
A section name associated with the connected object. Default: 'General'
Ok the doc needs fix (edited)
suggestion?
AgitatedDove14 - mean this - says name=None but text says default is General.
AgitatedDove14 is it possible to get the pipeline task running a step in a step? Is task.parent something that could help?
Is task.parent something that could help?
Exactly 🙂 something like:# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)
I am essentially creating a EphemeralDataset abstraction and creating controlled lifecycle for it such that the data is removed after a day in experiments. Additionally and optionally, data created during a step in a pipeline can be cleared once the pipeline completes
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?
Yes, for datasets where we need GDPR compliance
That is awesome!
If you feel like writing a bit about the use-case and how you solved it, I think AnxiousSeal95 will be more than happy to publish something like that 🙂