ClearML FAQ

Answered

Hi 🙂 is there a way to extract all imports outside the step? for example, instead of :

from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes
import sklearn  # noqa
import pickle
import pandas as pd
from clearml import StorageManager

# Make the following function an independent pipeline component step
# notice all package imports inside the function will be automatically logged as
# required packages for the pipeline execution step
@PipelineDecorator.component(return_values=["data_frame"], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
    print("step_one")
    # make sure we have scikit-learn for this step, we need it to use to unpickle the object


    local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url)
    with open(local_iris_pkl, "rb") as f:
        iris = pickle.load(f)
    data_frame = pd.DataFrame(iris["data"], columns=iris["feature_names"])
    data_frame.columns += ["target"]
    data_frame["target"] = iris["target"]
    return data_frame

run:

from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes


# Make the following function an independent pipeline component step
# notice all package imports inside the function will be automatically logged as
# required packages for the pipeline execution step
@PipelineDecorator.component(return_values=["data_frame"], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
    print("step_one")
    # make sure we have scikit-learn for this step, we need it to use to unpickle the object
    import sklearn  # noqa
    import pickle
    import pandas as pd
    from clearml import StorageManager

    local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url)
    with open(local_iris_pkl, "rb") as f:
        iris = pickle.load(f)
    data_frame = pd.DataFrame(iris["data"], columns=iris["feature_names"])
    data_frame.columns += ["target"]
    data_frame["target"] = iris["target"]
    return data_frame

  				
Posted 
	one year ago

					More  		
  Report
		
					JitteryOwl13
				
					0
					 × 1

Votes Newest

Answers 4

No I want to put them inside the pipeline.py file where I config all steps, like this:

from clearml import PipelineDecorator
from train_helpers.common import params
from dataset import DataModule
from train import Trainer


@PipelineDecorator.component(return_values=['_args'], cache=True)
def init_experiment():
    _args = params.parse_args()
    return _args


@PipelineDecorator.component(return_values=['data'], cache=False)
def data_preparation(args):
    data = DataModule(args)
    return data


@PipelineDecorator.component(cache=False)
def train_model(args, data):
    Trainer(args).train()


@PipelineDecorator.pipeline(name='Pipeline_decorator', project='Pipeline_decorator', version='0.1', pipeline_execution_queue=None)
def main():
    args, setup_logger = init_experiment()
    data = data_preparation(args)
    train_model(args, data)


if __name__ == '__main__':
    # PipelineDecorator.debug_pipeline()
    PipelineDecorator.run_locally()
    main()

  				
Posted 
	one year ago

					More  		
  Report
		
					JitteryOwl13
				
					0
					 × 1

Ah, I see now. There are a couple of ways to achieve this.

You can enforce that the pipeline steps execute within a predefined docker image that has all these submodules - this is not very flexible, but doesn't require your clearml-agents to have access to your Git repository
You can enforce that the pipeline steps execute within a predefined git repository, where you have all the code for these submodules - this is more flexible than option 1, but will require clearml-agents to have access to your Git repositoryFor agents to be able to access your Git repository, you must either specify agent.git_user and agent.git_pass in clearml.conf files on the worker machines, or to register ssh keys for those machines in your Git hosting server (like Bitbucket or Github) and add agent.force_git_ssh_protocol=true to those clearml.conf files I mentioned previously

  				
Posted 
	one year ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

Hey JitteryOwl13 , just to make sure I understand, you want to make your imports inside the pipeline step function, and you're asking whether this will work correctly?

If so, then the answer is yes, it will work fine if you move the imports inside the pipeline step function

  				
Posted 
	one year ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

Thank you

  				
Posted 
	one year ago

					More  		
  Report
		
					JitteryOwl13
				
					0
					 × 1

Write your answer

780 Views

4 Answers

one year ago