Hi Team, I Am Trying To Run A Pipeline Remotely Using Clearml Pipeline And I’M Encountering Some Issues. Could Anyone Please Assist Me In Resolving Them?

Answered

Hi Team,

I am trying to run a pipeline remotely using ClearML pipeline and I’m encountering some issues. Could anyone please assist me in resolving them?

Issue 1 : After executing the code, the pipeline is initiated on the “queue_remote_start” queue and the tasks of the pipeline are initiated on the “queue_remote” queue. However, the creation of the dataset failed because it couldn’t find the Python modules from the current directory.

Issue 2 : I also attempted to use the same queue for both pipe.start and pipe.set_default_execution_queue . However, the tasks of the pipeline remained in the pending and queued state and didn’t proceed to the next step.

To run the pipeline remotely, I have created two different queues and assigned a worker to each using the following commands:

clearml-agent daemon --detached --create-queue --queue queue_remote
clearml-agent daemon --detached --create-queue --queue queue_remote_start

I then executed the following command to run the pipeline remotely:

python3 pipeline.py

The code for the Pipeline from Functions is as follows:

# Create the PipelineController object
    pipe = PipelineController(
        name="pipeline",
        project=project_name,
        version="0.0.2",
        add_pipeline_tags=True,
    )

pipe.set_default_execution_queue('queue_remote')

pipe.add_function_step(
    name='step_one',
    function=step_one,
    function_kwargs={
            "train_file": constants.TRAINING_DATASET_PATH,
            "validation_file": constants.VALIDATAION_DATASET_PATH,
            "s3_output_uri": constants.CLEARML_DATASET_OUTPUT_URI,
            "dataset_project": project_name,
            "dataset_name": constants.CLEARML_TASK_NAME,
            "use_dummy_dataset": use_dummy_model_dataset,
        },
        project_name=project_name,
        task_name=create_dataset_task_name,
        task_type=Task.TaskTypes.data_processing,
    )

pipe.start(queue="queue_remote_start")

Could anyone please provide a solution on how to successfully run the pipeline remotely? Any help would be greatly appreciated.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

Votes Newest

Answers 39

It prints "True"

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> I have tried the same method as suggested by you and the pipeline still failed, as it couldn't find "modules". Could you please help me here?

I would like to describe the process again, which I was following:

I created a queue and assigned 2 workers to the queue.
In the pipeline.py file, to start the pipeline I used pipe.start(queue="queue_remote") and for the tasks I used pipe.set_default_execution_queue('queue_remote')
In the working_dir = ev_xxxx_xxtion/clearml I executed the code using python3 pipeline.py
The pipeline was initiated on queue " queue_remote " on worker 01 & the next tasks were initiated on queue " queue_remote " on worker 02 and it failed, as it couldn't find the modules in worker 02.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> I have attached two screenshots, One is pipeline initialization & other one is the task of the pipeline.

The project's directory is as follows:
The pipeline.py includes the code to run the pipeline & tasks of the pipeline.

├── Makefile
├── README.md
├── ev_xxxxxx_detection
│   ├── __init__.py
│   ├── __pycache__
│   │   └── __init__.cpython-311.pyc
│   ├── clearml
│   │   ├── __pycache__
│   │   ├── clearml_wrapper.py
│   │   ├── constants.py
│   │   ├── data_loader.py
│   │   ├── ev_trainer.py
│   │   ├── pipeline.py
│   │   └── util.py
├── poetry.lock
├── pyproject.toml

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

For the clearml-server installation I follow the documentation steps one by one. Link is : None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> Can you please help me out here

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

Can you please screenshot the INFO tab on the pipeline controller task?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

It prints "1.13.3rc0"

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

Oh I see. I think there is a mismatch between some clearml versions on your machine? How did you run these scripts exactly? (like the CLI, for example python test.py ?)

Or if you ran it via an IDE, what is the interpreter path?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

how did you install clearml?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

@<1523701435869433856:profile|SmugDolphin23> Sure, Thank you for the suggestion. I'll try to add imports as mentioned by you and execute the pipeline & check the functionality.

In Local I'm running using python3 pipelin.py and used pipe.start_locally(run_pipeline_steps_locally=True) in the pipeline to initialize & it's working fine.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> I run the code in order to step1, step2 and step3. And then I run the "pipeline_from_task.py" scripts. I follow the ClearML documentation so whole of the codes taken from github repo.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

@<1626028578648887296:profile|FreshFly37> can you share also logs of task ? It may give an idea.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

I have attached the screenshot of logs earlier

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

Thank you @<1523701435869433856:profile|SmugDolphin23> It is working now after the addition of repo details into each task. It seems that we need to specify repo details in each task to pull the code & execute the tasks on the worker.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> I have tried another way by including pipeline.py in the root directory of the code and executed “python3 pipeline.py” & still faced same issue

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> I retry the same scenario with clearml==1.14.1 package but still it does not show me the pipelines not showing in the UI :(

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

what about import clearml; print(clearml.__version__)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

I just use "pip install clearml" command for sdk.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

sure, I'll add those details & check. Thank you

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

ok, that is very useful actually

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> I used clearml==1.13.2 and now I am upgrading to clearml=1.14.1 version.Also I would give extra information about Clearml-server docker-compose file images versions is latest right now.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> , I’ve updated both the ClearML server and client to the latest version, 1.14.0, as per our previous conversation. However, I’m still encountering the same issue as described earlier.
WebApp: 1.14.0-431
Server: 1.14.0-431
API: 2.28

I attempted to use the same queue for both the controller and the steps, and assigned two workers to this queue. Upon executing the code, the pipeline was initiated on the “queue_remote” queue, and the tasks of the pipeline were also initiated on another worker in the “queue_remote” queue. However, the dataset creation failed because it was unable to locate the Python modules from the current directory as shown in the below screenshot.

Note: I stored the code and its dependencies in a GitHub repository when I executed the pipeline.

Please refer to the attached error screenshot and the code I used to run the pipeline for more details

  				
Posted 
	one year ago

					More
				  		
  Report
		
					FreshFly37
				
					0
					 × 1

@<1626028578648887296:profile|FreshFly37> how are you running this locally in the first place?
If you are running pipeline.py with cwd as ev_xx_detection/clearml , then I would not expect you to be able to do from ev_xx_detection.clearml import constants (for example), but import constants directly would work (as constants.py is in the same directory as pipeline.py ). The reason your remote run doesn't work is basically because of this:
cwd is ev_xx_detection/clearml and ev_xx_detection.clearml.constants is imported, but the module that should be imported is actually constants

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

what do you get when you run this code?

from clearml.backend_api import Session
print(Session.check_min_api_server_version("2.17"))

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

Hi!
It is possible to use the same queue for the controller and the steps, but there needs to be at least 2 agents that pull tasks from that queue. Otherwise, if there is only 1 agent, then that agent will be busy running the controller and it won't be able to fetch the steps.

Regarding missing local packages: the step is ran in a temporary directory that is different than the directory the script is originally in. To solve this, you could add all the modules/files you are interested in in a git repository. If you do, that repository will be cloned by the agent when running the steps, which will make the packages accessible.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

This print string like in below. """
if not self._task:
task_name = name or project or '{}'.format(datetime.now())
if self._pipeline_as_sub_project:
parent_project = (project + "/" if project else "") + self._pipeline_section
project_name = "{}/{}".format(parent_project, task_name)
else:
parent_project = None
project_name = project or 'Pipelines'
# if user disabled the auto-repo, we force local script storage (repo="" or repo=False) """

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

@<1626028578648887296:profile|FreshFly37> I see that create_dataset doesn't have a repo set. Can you try setting it manually via the repo repo_branch repo_commit arguments in the add_function_step method?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

I ran it via IDE. I am using conda environment and when I list the clearml packages it looks like in the below. The interpreter match with base environment.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalSeaturtle63
				
					0
					 × 1

Show more results

Write your answer

139K Views

39 Answers

one year ago