Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi

Hi 🙂 is there a way to extract all imports outside the step? for example, instead of :

from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes
import sklearn  # noqa
import pickle
import pandas as pd
from clearml import StorageManager

# Make the following function an independent pipeline component step
# notice all package imports inside the function will be automatically logged as
# required packages for the pipeline execution step
@PipelineDecorator.component(return_values=["data_frame"], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
    print("step_one")
    # make sure we have scikit-learn for this step, we need it to use to unpickle the object


    local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url)
    with open(local_iris_pkl, "rb") as f:
        iris = pickle.load(f)
    data_frame = pd.DataFrame(iris["data"], columns=iris["feature_names"])
    data_frame.columns += ["target"]
    data_frame["target"] = iris["target"]
    return data_frame

run:

from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes


# Make the following function an independent pipeline component step
# notice all package imports inside the function will be automatically logged as
# required packages for the pipeline execution step
@PipelineDecorator.component(return_values=["data_frame"], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
    print("step_one")
    # make sure we have scikit-learn for this step, we need it to use to unpickle the object
    import sklearn  # noqa
    import pickle
    import pandas as pd
    from clearml import StorageManager

    local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url)
    with open(local_iris_pkl, "rb") as f:
        iris = pickle.load(f)
    data_frame = pd.DataFrame(iris["data"], columns=iris["feature_names"])
    data_frame.columns += ["target"]
    data_frame["target"] = iris["target"]
    return data_frame
  
  
Posted 9 months ago
Votes Newest

Answers 4


Thank you

  
  
Posted 9 months ago

Hey @<1678212417663799296:profile|JitteryOwl13> , just to make sure I understand, you want to make your imports inside the pipeline step function, and you're asking whether this will work correctly?

If so, then the answer is yes, it will work fine if you move the imports inside the pipeline step function

  
  
Posted 9 months ago

No I want to put them inside the pipeline.py file where I config all steps, like this:

from clearml import PipelineDecorator
from train_helpers.common import params
from dataset import DataModule
from train import Trainer


@PipelineDecorator.component(return_values=['_args'], cache=True)
def init_experiment():
    _args = params.parse_args()
    return _args


@PipelineDecorator.component(return_values=['data'], cache=False)
def data_preparation(args):
    data = DataModule(args)
    return data


@PipelineDecorator.component(cache=False)
def train_model(args, data):
    Trainer(args).train()


@PipelineDecorator.pipeline(name='Pipeline_decorator', project='Pipeline_decorator', version='0.1', pipeline_execution_queue=None)
def main():
    args, setup_logger = init_experiment()
    data = data_preparation(args)
    train_model(args, data)


if __name__ == '__main__':
    # PipelineDecorator.debug_pipeline()
    PipelineDecorator.run_locally()
    main()
  
  
Posted 9 months ago

Ah, I see now. There are a couple of ways to achieve this.

  • You can enforce that the pipeline steps execute within a predefined docker image that has all these submodules - this is not very flexible, but doesn't require your clearml-agents to have access to your Git repository
  • You can enforce that the pipeline steps execute within a predefined git repository, where you have all the code for these submodules - this is more flexible than option 1, but will require clearml-agents to have access to your Git repositoryFor agents to be able to access your Git repository, you must either specify agent.git_user and agent.git_pass in clearml.conf files on the worker machines, or to register ssh keys for those machines in your Git hosting server (like Bitbucket or Github) and add agent.force_git_ssh_protocol=true to those clearml.conf files I mentioned previously
  
  
Posted 9 months ago