///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

Answered

///[Please note, all the below was executed on the command line of the compute node, not the server head node]///

I've been following the example on Keras, but using a PyTorch model.

I have setup a serving instance with the following command:
clearml-serving triton --project "Caltech Birds/Deployment" --name "ResNet34 Serving"

I then added the model endpoint and the model ID of the model to be served:
clearml-serving triton --endpoint "resent34_cub200" --model-id "57ed24c1011346d292ecc9e797ccb47e"

The model was trained using an experiment script which included the generation of a config.pbtxt configuration file at the time of completion of model training. This was connected to the experiment configuration as per the Keras example, and resulted in the following configuration being added to the experiment:

platform: "pytorch_libtorch" input [ { name: "input_layer" data_type: TYPE_FP32 dims: [ 3, 224, 224 ] } ] output [ { name: "fc" data_type: TYPE_FP32 dims: [ 200 ] } ]
I then created a queue on a GPU compute node (as the model requires GPU resource):
clearml-agent daemon --queue gpu_serving --gpus all --detached --docker

The serving endpoint is then started with the following command:
clearml-serving launch -queue gpu_serving

I can see two items in my deployment sub-project, the service I created, and a triton serving engine inference object.

On execution, the triton serving engine inference fails with the following errors :

2021-06-07 16:43:15 task df3fbe15a88d400db222d99b7e6ceea1 pulled from 69fd217bb5f743be83f1400fbe394d86 by worker ecm-clearml-compute-gpu-001:gpuall 2021-06-07 16:43:15 Running Task df3fbe15a88d400db222d99b7e6ceea1 inside docker: nvcr.io/nvidia/tritonserver:21.03-py3 arguments: ['--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002'] 2021-06-07 16:43:15 Executing: ['docker', 'run', '-t', '--gpus', 'all', '--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002', '-e', 'CLEARML_WORKER_ID=ecm-clearml-compute-gpu-001:gpuall', '-e', 'CLEARML_DOCKER_IMAGE=nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002', '-v', '/tmp/.clearml_agent.wcz3vwg0.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.41w35nmr:/root/.ssh', '-v', '/home/edmorris/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/edmorris/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/edmorris/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/edmorris/.clearml/cache:/clearml_agent_cache', '-v', '/home/edmorris/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvcr.io/nvidia/tritonserver:21.03-py3', 'bash', '-c', 'apt-get update ; apt-get install -y git ; . /opt/conda/etc/profile.d/conda.sh ; conda activate base ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id df3fbe15a88d400db222d99b7e6ceea1'] 2021-06-07 16:43:20 docker: Error response from daemon: driver failed programming external connectivity on endpoint pedantic_hellman (05037e65bcb949267b0708a35cfa931e3268bf16fef8be13c71d0614f5144954): Error starting userland proxy: listen tcp4 0.0.0.0:8001: bind: address already in use. 2021-06-07 16:43:20 Process failed, exit code 125
I am not sure if I have done something incorrect, like for instance, how I provisioned the queue for the GPU inference compute resource. Perhaps I caused a conflict with ports?

Or, is this a ports issue on the compute resource?
Do I need to open up port 8001 on the GPU inference compute resource to allow access to the model endpoint?

I executed the clearml-serving commands on the compute resource, but should this have been executed on the server?
Or even a client machine?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

Votes Newest

Answers 7

This potentially might be a silly question, but in order to get the inference working, I am assuming that no specific inference script has to be written for handling the model?

This is what the clearml-serving package takes care of, correct?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

I have rerun the serving example with my PyTorch job, but this time I have followed the MNIST Keras example.
I appended a GPU compute resource to the default queue and then executed the service on the default queue.
This resulted in a Triton serving engine container spinning up on the compute resource, however it failed due to the previous issue with ports conflicts:

2021-06-08 16:28:49 task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clearml-compute-gpu-002:0 2021-06-08 16:28:49 Running Task f2fbb3218e8243be9f6ab37badbb4856 inside docker: nvcr.io/nvidia/tritonserver:21.03-py3 arguments: ['--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002'] 2021-06-08 16:28:50 Executing: ['docker', 'run', '-t', '--gpus', 'all', '--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002', '-e', 'CLEARML_WORKER_ID=ecm-clearml-compute-gpu-002:0', '-e', 'CLEARML_DOCKER_IMAGE=nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002', '-v', '/tmp/.clearml_agent.ft8vulpe.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.j9b8arhf:/root/.ssh', '-v', '/home/edmorris/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/edmorris/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/edmorris/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/edmorris/.clearml/cache:/clearml_agent_cache', '-v', '/home/edmorris/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvcr.io/nvidia/tritonserver:21.03-py3', 'bash', '-c', 'apt-get update ; apt-get install -y git ; . /opt/conda/etc/profile.d/conda.sh ; conda activate base ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id f2fbb3218e8243be9f6ab37badbb4856'] 2021-06-08 16:28:55 docker: Error response from daemon: driver failed programming external connectivity on endpoint wonderful_galileo (0c2feca5684f2f71b11fa1e8da4550d42b23c456e52ba0069d0aae64cd75f55b): Error starting userland proxy: listen tcp4 0.0.0.0:8001: bind: address already in use. 2021-06-08 16:28:55 Process failed, exit code 125

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

SuccessfulKoala55
I can see the issue your are referring to regarding the execution of the triton docker image, however as far as I am aware, this was not something I explicitly specified. The ServingService.launch_service() method from the ServingService Class from the clearml-serving package would appear to have both specified:

` def launch_engine(self, queue_name, queue_id=None, verbose=True):
# type: (Optional[str], Optional[str], bool) -> None
"""
Launch serving engine on a specific queue
:param queue_name: Queue name to launch the engine service running the inference on.
:param queue_id: specify queue id (unique stand stable) instead of queue_name
:param verbose: If True print progress to console
"""

    # todo: add more engines
    if self._engine_type == 'triton':
        # create the serving engine Task
        engine_task = Task.create(
            project_name=self._task.get_project_name(),
            task_name="triton serving engine",
            task_type=Task.TaskTypes.inference,
            repo=" ` ` ",
            branch="main",
            commit="ad049c51c146e9b7852f87e2f040e97d88848a1f",
            script="clearml_serving/triton_helper.py",
            working_directory=".",
            docker="nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002",
            argparse_args=[('serving_id', self._task.id), ],
            add_task_init_call=False,
        )
        if verbose:
            print('Launching engine {} on queue {}'.format(self._engine_type, queue_id or queue_name))
        engine_task.enqueue(task=engine_task, queue_name=queue_name, queue_id=queue_id) `

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

SuccessfulKoala55 I may have made some progress with this bug, but have stumbled onto another issue in getting the Triton service up and running.

See comments in the github issue.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

Thanks VivaciousPenguin66 , we'll take a look 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I have created a Github issue 3 on the clearml-serving repo.

https://github.com/allegroai/clearml-serving/issues/3

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VivaciousPenguin66
				
					0
					 × 1

Hi VivaciousPenguin66 , this is actually a docker/OS issue - basically, the port (8001) is already in use. I suspect this is because you've used --ipc=host which means this port is probably in use in the host machine, and since the docker is sharing the host system's IPC namespace, you get this error. I'm actually not sure --ipc=host can be used in conjunction with the -p ... directive (I'd start by using either the IPC or the ports mapping, but not both, and see how it works)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

2K Views

7 Answers

4 years ago

2 years ago