Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi! I Am Trying To Build And Run A Pipeline. I Pass My Dataset As Parameter Of Pipeline:

Hi!
I am trying to build and run a pipeline. I pass my dataset as parameter of pipeline:

pipe.add_parameter(name='dataset_df',
                       description='Initial dataset .parquet file',
                       default=dataset_df,
                       param_type="pd.DataFrame")

Then I refer to this param, try to use it in my first step of pipeline that bases on another previously generated task (which was generated on an empty dataset):

pipe.add_step(
            name=f'{step_one_name}_{n_predict}',
            base_task_id=base_task.task_id,
            execution_queue=pipeline_steps_execution_queue,
            cache_executed_step=cache,
            parameter_override={"General/dataset_df": "${pipeline.dataset_df}",
                                "General/n_predict": n_predict,
                                "General/period_size": "${pipeline.period_size}",
                                "General/preprocessing_kwargs_params": "${pipeline.preprocessing_kwargs_params}"})

But I receive an error that states that my dataset is empty, although it is not. I guess, ClearML doesn't use my dataset in the task, does not override

Could you please give any ideas how to pass my dataset into the task properly?

  
  
Posted one year ago
Votes Newest

Answers 3


Thank you, guys. I've figured out the solution with your help! @<1523701205467926528:profile|AgitatedDove14> @<1537605940121964544:profile|EnthusiasticShrimp49>

  
  
Posted one year ago

I pass my dataset as parameter of pipeline:

@<1523704757024198656:profile|MysteriousWalrus11> I think you were expecting the dataset_df dataframe to be automatically serialized and passed, is that correct ?
If you are using add_step, all arguments are simple types (i.e. str, int etc.)
If you want to pass complex types, your code should be able to upload it as an artifact and then you can pass the artifact url (or name) for the next step.

Another option is to use pipeline from decorators, where the data is being passed transparently between the components (like you would expect from python code).
Check this example: None

  
  
Posted one year ago

Hey @<1523704757024198656:profile|MysteriousWalrus11> , given your use case, did you consider passing the path to the dataset? Like an address to an S3 bucket

  
  
Posted one year ago