Reputation
Badges 1
28 × Eureka! pipe.add_function_step(
name='step_one',
function=step_one,
function_kwargs=dict(pickle_data_url='${pipeline.url}'),
function_return=['data_frame'],
cache_executed_step=True,
#############
docker='${pipeline.url}' !!!!!! this does not work
) ``
TimelyMouse69 the main problem is the arguments here is the code snippetpipeline = PipelineController( name="Awesome Pipeline") pipeline.add_parameter( "docker_image", default="DEFAULT_DOCKER")And the I have functional step, where I want to use the argumentpipeline.add_function_step(name="best_step", docker="${pipeline.docker_image}"And also I tried
` parameters = pipeline.get_parameters()
pipeline.add_function_step(name="best_step", docker=parameters["docker_image"...
I agree, a lot of packages should be installed before I can execute any command, but having something like "sub nodes" inside pipeline, in my opinion, makes them much more useful, in sense that all the steps are visible. I haven't used pipelines before and when I saw this UI I was thinking it would be very cool highlight the execution steps.
When I add sleep to the process_data it works if it was enough time to upload data
def process_data(inputs): import time import pandas as pd from clearml import PipelineController _logger = PipelineController.get_logger() df = pd.DataFrame.create(inputs) _logger.report_table('Awesome', 'Details', table_plot=df) time.sleep(10)
@<1523701070390366208:profile|CostlyOstrich36> Yes SDK 🙂
I see that that is not possible, but I also see that report_histogram is there (which does reporting of the plotly)
and was wondering is there any way to report custom plotly when I have my own layeout
My reasoning is that pipelines can give me good visual overview of what is going on and I want to have a lot of small steps. My dataset is 2 Gb of images, and I want to have a step where I download it with StorageManger.get_local_copy() save it and pass to the next steps only path to this datasets. But every agent is a different pod so I do not know how properly share the folder with images.
AnxiousSeal95 Thank you so much! I will use it.
AnxiousSeal95 here
the intuition is: I care of the step result, and I also care what are the sub-steps in the step.
Example: step – evaluate model , consists of dataset + model. I need substeps
download dataset download models evaluateI do not really care what will be in the substeps metrics, but I care what is stored in the evaluate model step. It will make everything compact and easily accessable
AgitatedDove14 Yeah, you are right since sub component is not a task than I the caching won't work. but it is a step result what's important so if the step cache is available I think it should cover the majority of pipeline usecases.
Yes I think it absolutely fine. Here is the pseudocode of my understanding with ClearML syntax:
`
def complex_steps(args):
As far as I see the functions should be implemented inside the step for ClearML be able to see them
@sub_node
def action_1(params):
....
return result
@sub_node
def action_2(params):
....
return result
@sub_node
def action_3(params_1, params_2):
....
return result
act1_result = action_1(args.param1)
...
Hi AgitatedDove14 storage.
Step 1 of the pipeline – generate file Step 2 of the pipeline – read file generated at the step 1
no, because I want to use pipe.add_parameter in the docker field of the pipe.add_function_step not in the function_kwargs
Yes, I that's what I found, otherwise clearml won't be able to see this function during execution time. I think it would be great to have such possibility because step can be constructed with multiple sub-components but not all of them might be added to the UI graph. Some of them are just helper functions which will make code more readable
I'm running 1.7.0 (latest docker available).
Your example did work for me, but I'm gonna try the flush() method now
From my experience with the pipeline so far and "sub-node" idea, I would say:
Keep pipeline controller with possibility to define where to run whole pipeline (same node/pod) Every step can be pushed to be executed on different pod Every step is a Task but step can consist of multiple function which are "sub-node" and they must be executed on the same pod/node where the functional_step is defined.
As a result if the pipeline requires sharing large files select the pipeline to ru...
CostlyOstrich36 maybe you have any idea why this code might not work for me?
But I want to use value from the arguments.
Yes, but I'm not sure that they need to have separate task. In my opinion, it would be better if they are visible in the UI but all the metrics/artifacts are reported to the step Task
function_kwargs they work , but docker parameter not
I have another issue with pipelines, I have described it in the another thread would you mind if I tag you there? because no solution 😞
Yes it did work! Thank you!
AgitatedDove14 maybe you have idea how to deal with the second issue? because this is exactly what I want to get 🙂
AgitatedDove14 thank for the link, but I need a different thing.
Step 1 of the pipeline I download images from s3 (many of them) and want to return paths Step 2 of the pipeline read images from that pathHere is a psedocode
` def step_one():
download_dataset = StorageManger.get_local_copy()
paths = collect_pathes_as_strings()
return paths
def step_two(paths):
image_1 = read_image(paths[0]) `
you can do it with docker_bash_setup_script where you run conda install what you need
Hi CostlyOstrich36
Here is the code example which does not work for me
` def process_data(inputs):
import pandas as pd
from clearml import PipelineController
_logger = PipelineController.get_logger()
df = pd.DataFrame.create(inputs)
_logger.report_table('Awesome', 'Details', table_plot=df)
pipeline = PipelineController(name='best_pipeline', project='test')
pipeline.add_function_step(name='process_data', function=process_data,
function_kw...
but how pass the file as argument, idk
Tried and as output clearml-agent is trying to pull image '${pipeline.docker_image}' can not convert it to the value