
Reputation
Badges 1
28 × Eureka!CostlyOstrich36 maybe you have any idea why this code might not work for me?
Pass this to the func_step
docker_args="--env CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1",
I'm running 1.7.0 (latest docker available).
Your example did work for me, but I'm gonna try the flush()
method now
My reasoning is that pipelines can give me good visual overview of what is going on and I want to have a lot of small steps. My dataset is 2 Gb of images, and I want to have a step where I download it with StorageManger.get_local_copy()
save it and pass to the next steps only path to this datasets. But every agent is a different pod so I do not know how properly share the folder with images.
AnxiousSeal95 here
AgitatedDove14 maybe you have idea how to deal with the second issue? because this is exactly what I want to get 🙂
From my experience with the pipeline so far and "sub-node" idea, I would say:
Keep pipeline controller with possibility to define where to run whole pipeline (same node/pod) Every step can be pushed to be executed on different pod Every step is a Task but step can consist of multiple function which are "sub-node" and they must be executed on the same pod/node where the functional_step
is defined.
As a result if the pipeline requires sharing large files select the pipeline to ru...
Yes I think it absolutely fine. Here is the pseudocode of my understanding with ClearML syntax:
`
def complex_steps(args):
As far as I see the functions should be implemented inside the step for ClearML be able to see them
@sub_node
def action_1(params):
....
return result
@sub_node
def action_2(params):
....
return result
@sub_node
def action_3(params_1, params_2):
....
return result
act1_result = action_1(args.param1)
...
Hi AgitatedDove14 storage.
Step 1 of the pipeline – generate file Step 2 of the pipeline – read file generated at the step 1
Hi CostlyOstrich36
Here is the code example which does not work for me
` def process_data(inputs):
import pandas as pd
from clearml import PipelineController
_logger = PipelineController.get_logger()
df = pd.DataFrame.create(inputs)
_logger.report_table('Awesome', 'Details', table_plot=df)
pipeline = PipelineController(name='best_pipeline', project='test')
pipeline.add_function_step(name='process_data', function=process_data,
function_kw...
AgitatedDove14 Yeah, you are right since sub component is not a task than I the caching won't work. but it is a step result what's important so if the step cache is available I think it should cover the majority of pipeline usecases.
the intuition is: I care of the step result, and I also care what are the sub-steps in the step.
Example: step – evaluate model
, consists of dataset + model. I need substeps
download dataset download models evaluateI do not really care what will be in the substeps metrics, but I care what is stored in the evaluate model
step. It will make everything compact and easily accessable
function_kwargs
they work , but docker
parameter not
But I want to use value from the arguments.
AgitatedDove14 thank for the link, but I need a different thing.
Step 1 of the pipeline I download images from s3 (many of them) and want to return paths Step 2 of the pipeline read images from that pathHere is a psedocode
` def step_one():
download_dataset = StorageManger.get_local_copy()
paths = collect_pathes_as_strings()
return paths
def step_two(paths):
image_1 = read_image(paths[0]) `
Yes, but I'm not sure that they need to have separate task. In my opinion, it would be better if they are visible in the UI but all the metrics/artifacts are reported to the step Task
you can do it with docker_bash_setup_script
where you run conda install
what you need
but how pass the file as argument, idk
@<1523701070390366208:profile|CostlyOstrich36> Yes SDK 🙂
I see that that is not possible, but I also see that report_histogram
is there (which does reporting of the plotly)
and was wondering is there any way to report custom plotly when I have my own layeout
Yes, I that's what I found, otherwise clearml won't be able to see this function during execution time. I think it would be great to have such possibility because step can be constructed with multiple sub-components
but not all of them might be added to the UI graph. Some of them are just helper functions which will make code more readable
I agree, a lot of packages should be installed before I can execute any command, but having something like "sub nodes" inside pipeline, in my opinion, makes them much more useful, in sense that all the steps are visible. I haven't used pipelines before and when I saw this UI I was thinking it would be very cool highlight the execution steps.
AnxiousSeal95 Thank you so much! I will use it.
TimelyMouse69 the main problem is the arguments here is the code snippetpipeline = PipelineController( name="Awesome Pipeline") pipeline.add_parameter( "docker_image", default="DEFAULT_DOCKER")
And the I have functional step, where I want to use the argumentpipeline.add_function_step(name="best_step", docker="${pipeline.docker_image}"
And also I tried
` parameters = pipeline.get_parameters()
pipeline.add_function_step(name="best_step", docker=parameters["docker_image"...
When I add sleep
to the process_data
it works if it was enough time to upload data
def process_data(inputs): import time import pandas as pd from clearml import PipelineController _logger = PipelineController.get_logger() df = pd.DataFrame.create(inputs) _logger.report_table('Awesome', 'Details', table_plot=df) time.sleep(10)
pipe.add_function_step(
name='step_one',
function=step_one,
function_kwargs=dict(pickle_data_url='${pipeline.url}'),
function_return=['data_frame'],
cache_executed_step=True,
#############
docker='${pipeline.url}' !!!!!! this does not work
) ``
Tried and as output clearml-agent
is trying to pull image '${pipeline.docker_image}'
can not convert it to the value
Yes it did work! Thank you!
I have another issue with pipelines, I have described it in the another thread would you mind if I tag you there? because no solution 😞