Hi Team,
I am trying to run a pipeline remotely using ClearML pipeline and I’m encountering some issues. Could anyone please assist me in resolving them?
Issue 1 : After executing the code, the pipeline is initiated on the “queue_remote_start” queue and the tasks of the pipeline are initiated on the “queue_remote” queue. However, the creation of the dataset failed because it couldn’t find the Python modules from the current directory.
Issue 2 : I also attempted to use the same queue for both pipe.start
and pipe.set_default_execution_queue
. However, the tasks of the pipeline remained in the pending and queued state and didn’t proceed to the next step.
To run the pipeline remotely, I have created two different queues and assigned a worker to each using the following commands:
clearml-agent daemon --detached --create-queue --queue queue_remote
clearml-agent daemon --detached --create-queue --queue queue_remote_start
I then executed the following command to run the pipeline remotely:
python3 pipeline.py
The code for the Pipeline from Functions is as follows:
# Create the PipelineController object
pipe = PipelineController(
name="pipeline",
project=project_name,
version="0.0.2",
add_pipeline_tags=True,
)
pipe.set_default_execution_queue('queue_remote')
pipe.add_function_step(
name='step_one',
function=step_one,
function_kwargs={
"train_file": constants.TRAINING_DATASET_PATH,
"validation_file": constants.VALIDATAION_DATASET_PATH,
"s3_output_uri": constants.CLEARML_DATASET_OUTPUT_URI,
"dataset_project": project_name,
"dataset_name": constants.CLEARML_TASK_NAME,
"use_dummy_dataset": use_dummy_model_dataset,
},
project_name=project_name,
task_name=create_dataset_task_name,
task_type=Task.TaskTypes.data_processing,
)
pipe.start(queue="queue_remote_start")
Could anyone please provide a solution on how to successfully run the pipeline remotely? Any help would be greatly appreciated.