Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Team, I Am Trying To Run A Pipeline Remotely Using Clearml Pipeline And I’M Encountering Some Issues. Could Anyone Please Assist Me In Resolving Them?

Hi Team,

I am trying to run a pipeline remotely using ClearML pipeline and I’m encountering some issues. Could anyone please assist me in resolving them?

Issue 1 : After executing the code, the pipeline is initiated on the “queue_remote_start” queue and the tasks of the pipeline are initiated on the “queue_remote” queue. However, the creation of the dataset failed because it couldn’t find the Python modules from the current directory.

Issue 2 : I also attempted to use the same queue for both pipe.start and pipe.set_default_execution_queue . However, the tasks of the pipeline remained in the pending and queued state and didn’t proceed to the next step.

To run the pipeline remotely, I have created two different queues and assigned a worker to each using the following commands:

clearml-agent daemon --detached --create-queue --queue queue_remote
clearml-agent daemon --detached --create-queue --queue queue_remote_start

I then executed the following command to run the pipeline remotely:

python3 pipeline.py

The code for the Pipeline from Functions is as follows:

# Create the PipelineController object
    pipe = PipelineController(
        name="pipeline",
        project=project_name,
        version="0.0.2",
        add_pipeline_tags=True,
    )

pipe.set_default_execution_queue('queue_remote')

pipe.add_function_step(
    name='step_one',
    function=step_one,
    function_kwargs={
            "train_file": constants.TRAINING_DATASET_PATH,
            "validation_file": constants.VALIDATAION_DATASET_PATH,
            "s3_output_uri": constants.CLEARML_DATASET_OUTPUT_URI,
            "dataset_project": project_name,
            "dataset_name": constants.CLEARML_TASK_NAME,
            "use_dummy_dataset": use_dummy_model_dataset,
        },
        project_name=project_name,
        task_name=create_dataset_task_name,
        task_type=Task.TaskTypes.data_processing,
    )

pipe.start(queue="queue_remote_start")

Could anyone please provide a solution on how to successfully run the pipeline remotely? Any help would be greatly appreciated.

  
  
Posted one year ago
Votes Newest

Answers 39


how about this one?

import clearml
import os
print("\n".join(open(os.path.join(clearml.__path__[0], "automation/controller.py")).read().split("\n")[310:320]))
  
  
Posted one year ago

Hi!
It is possible to use the same queue for the controller and the steps, but there needs to be at least 2 agents that pull tasks from that queue. Otherwise, if there is only 1 agent, then that agent will be busy running the controller and it won't be able to fetch the steps.

Regarding missing local packages: the step is ran in a temporary directory that is different than the directory the script is originally in. To solve this, you could add all the modules/files you are interested in in a git repository. If you do, that repository will be cloned by the agent when running the steps, which will make the packages accessible.

  
  
Posted one year ago

@<1523701435869433856:profile|SmugDolphin23> I have tried the same method as suggested by you and the pipeline still failed, as it couldn't find "modules". Could you please help me here?

I would like to describe the process again, which I was following:

  • I created a queue and assigned 2 workers to the queue.
  • In the pipeline.py file, to start the pipeline I used pipe.start(queue="queue_remote") and for the tasks I used pipe.set_default_execution_queue('queue_remote')
  • In the working_dir = ev_xxxx_xxtion/clearml I executed the code using python3 pipeline.py
  • The pipeline was initiated on queue " queue_remote " on worker 01 & the next tasks were initiated on queue " queue_remote " on worker 02 and it failed, as it couldn't find the modules in worker 02.
  
  
Posted one year ago

I ran it via IDE. I am using conda environment and when I list the clearml packages it looks like in the below. The interpreter match with base environment.
image

  
  
Posted one year ago

sure, I'll add those details & check. Thank you

  
  
Posted one year ago

This print string like in below. """
if not self._task:
task_name = name or project or '{}'.format(datetime.now())
if self._pipeline_as_sub_project:
parent_project = (project + "/" if project else "") + self._pipeline_section
project_name = "{}/{}".format(parent_project, task_name)
else:
parent_project = None
project_name = project or 'Pipelines'
# if user disabled the auto-repo, we force local script storage (repo="" or repo=False) """

  
  
Posted one year ago

@<1523701435869433856:profile|SmugDolphin23> I used clearml==1.13.2 and now I am upgrading to clearml=1.14.1 version.Also I would give extra information about Clearml-server docker-compose file images versions is latest right now.

  
  
Posted one year ago

I just use "pip install clearml" command for sdk.

  
  
Posted one year ago

@<1523701435869433856:profile|SmugDolphin23> Sure, Thank you for the suggestion. I'll try to add imports as mentioned by you and execute the pipeline & check the functionality.

In Local I'm running using python3 pipelin.py and used pipe.start_locally(run_pipeline_steps_locally=True) in the pipeline to initialize & it's working fine.

  
  
Posted one year ago
124K Views
39 Answers
one year ago
one year ago
Tags
Similar posts