Hi, I Have An Issue When Running A Pipeline Controller Remotely In Docker. Basically I Have A Module That Reads A Config File Into A Dict And Calls The Pipeline Controller, Like

Answered

Hi, I have an issue when running a pipeline controller remotely in docker. Basically I have a module that reads a config file into a dict and calls the pipeline controller, like python -m my_pipeline --config ./config.yml . The pipeline controller then passes the dict config to other pipeline components. If I set start_controller_locally=True , everything works fine, the steps are run in the docker container in the remote machine with the correct config. However, if I set start_controller_locally=False , then the pipeline fails because it runs python -m my_pipeline --config ./config.yml instead of just the controller function and tries to read ./config.yml , which is not available in the docker container. Is that the correct behavior? I would expect for it to run only the controller function with the dict config, as in the components.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SlipperySheep79
				
					0
					 × 1

Votes Newest

Answers 11

Hi @<1570220858075516928:profile|SlipperySheep79> , I think it depends on your code. Can you provide a self contained code snippet that reproduces this?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi @<1523701087100473344:profile|SuccessfulKoala55> , I think the issue is where to put the connect_configuration call. I can't put it inside run_pipeline because it's only running remotely and it doesn't have access to the file, and I can't put it in the script before the call to run_pipeline since the task has not been initialized yet.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SlipperySheep79
				
					0
					 × 1

Hi @<1523701435869433856:profile|SmugDolphin23> , I just tried it but Task.current_task() returns None even when running in remotely

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SlipperySheep79
				
					0
					 × 1

Also: what's the purpose of storing the pipeline arguments as artifacts then? When it runs remotely it still runs the main script as entrypoint and not the pipeline function directly, so all the arguments will be replaced by what is passed to the function during the remote execution, right?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SlipperySheep79
				
					0
					 × 1

basically, I think that the pipeline run starts from __ main_ _ and not the pipeline function, which causes the file to be read

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

@<1570220858075516928:profile|SlipperySheep79> depending on a local file is always an issue - would try to connect a configuration based on this file, so that it will be loaded when running locally and than retrieved from the backend when running remotely

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

How about if Task.running_locally(): ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

I've upladed an example here for simiplicity: None

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SlipperySheep79
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> then the issue is that config is not set. I also tried with:

import yaml
import argparse
from my_pipeline.pipeline import run_pipeline
from clearml import Task

parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True)

if __name__ == '__main__':
    if Task.running_locally()::
      args = parser.parse_args()
      with open(args.config) as f:
          config = yaml.load(f, yaml.FullLoader)
    else:
      config = None
    run_pipeline(config)

But then it prints None , so the pipeline parameters are completly ignored

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SlipperySheep79
				
					0
					 × 1

Hi @<1570220858075516928:profile|SlipperySheep79> ! What happens if you do this:

import yaml
import argparse
from my_pipeline.pipeline import run_pipeline
from clearml import Task

parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True)

if __name__ == '__main__':
    if not Task.current_task():
      args = parser.parse_args()
      with open(args.config) as f:
          config = yaml.load(f, yaml.FullLoader)
    run_pipeline(config)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

For instance, I have in my_pipeline/__main__.py :

import yaml
import argparse
from my_pipeline.pipeline import run_pipeline

parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True)

if __name__ == '__main__':
    args = parser.parse_args()
    with open(args.config) as f:
        config = yaml.load(f, yaml.FullLoader)
    run_pipeline(config)

and in my_pipeline/pipeline.py :

@PipelineDecorator.pipeline(
    name='Main',
    project=None,
    default_queue='default',
    pipeline_execution_queue='default',
    start_controller_locally=False,
    repo='

',
    add_run_number=False)
def run_pipeline(config: Dict):
    print(config)

I'm running this on an agent in docker mode

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SlipperySheep79
				
					0
					 × 1

Write your answer

2K Views

11 Answers

2 years ago