This potentially might be a silly question, but in order to get the inference working, I am assuming that no specific inference script has to be written for handling the model?
This is what the clearml-serving package takes care of, correct?
SuccessfulKoala55 I may have made some progress with this bug, but have stumbled onto another issue in getting the Triton service up and running.
See comments in the github issue.
Thanks VivaciousPenguin66 , we'll take a look 🙂
I have created a Github issue 3 on the clearml-serving
repo.
I have rerun the serving example with my PyTorch job, but this time I have followed the MNIST Keras example.
I appended a GPU compute resource to the default queue and then executed the service on the default queue.
This resulted in a Triton serving engine container spinning up on the compute resource, however it failed due to the previous issue with ports conflicts:
2021-06-08 16:28:49 task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clearml-compute-gpu-002:0 2021-06-08 16:28:49 Running Task f2fbb3218e8243be9f6ab37badbb4856 inside docker: nvcr.io/nvidia/tritonserver:21.03-py3 arguments: ['--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002'] 2021-06-08 16:28:50 Executing: ['docker', 'run', '-t', '--gpus', 'all', '--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002', '-e', 'CLEARML_WORKER_ID=ecm-clearml-compute-gpu-002:0', '-e', 'CLEARML_DOCKER_IMAGE=nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002', '-v', '/tmp/.clearml_agent.ft8vulpe.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.j9b8arhf:/root/.ssh', '-v', '/home/edmorris/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/edmorris/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/edmorris/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/edmorris/.clearml/cache:/clearml_agent_cache', '-v', '/home/edmorris/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvcr.io/nvidia/tritonserver:21.03-py3', 'bash', '-c', 'apt-get update ; apt-get install -y git ; . /opt/conda/etc/profile.d/conda.sh ; conda activate base ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id f2fbb3218e8243be9f6ab37badbb4856'] 2021-06-08 16:28:55 docker: Error response from daemon: driver failed programming external connectivity on endpoint wonderful_galileo (0c2feca5684f2f71b11fa1e8da4550d42b23c456e52ba0069d0aae64cd75f55b): Error starting userland proxy: listen tcp4 0.0.0.0:8001: bind: address already in use. 2021-06-08 16:28:55 Process failed, exit code 125
SuccessfulKoala55
I can see the issue your are referring to regarding the execution of the triton docker image, however as far as I am aware, this was not something I explicitly specified. The ServingService.launch_service()
method from the ServingService
Class from the clearml-serving
package would appear to have both specified:
` def launch_engine(self, queue_name, queue_id=None, verbose=True):
# type: (Optional[str], Optional[str], bool) -> None
"""
Launch serving engine on a specific queue
:param queue_name: Queue name to launch the engine service running the inference on.
:param queue_id: specify queue id (unique stand stable) instead of queue_name
:param verbose: If True print progress to console
"""
# todo: add more engines
if self._engine_type == 'triton':
# create the serving engine Task
engine_task = Task.create(
project_name=self._task.get_project_name(),
task_name="triton serving engine",
task_type=Task.TaskTypes.inference,
repo=" ` ` ",
branch="main",
commit="ad049c51c146e9b7852f87e2f040e97d88848a1f",
script="clearml_serving/triton_helper.py",
working_directory=".",
docker="nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002",
argparse_args=[('serving_id', self._task.id), ],
add_task_init_call=False,
)
if verbose:
print('Launching engine {} on queue {}'.format(self._engine_type, queue_id or queue_name))
engine_task.enqueue(task=engine_task, queue_name=queue_name, queue_id=queue_id) `
Hi VivaciousPenguin66 , this is actually a docker/OS issue - basically, the port (8001) is already in use. I suspect this is because you've used --ipc=host
which means this port is probably in use in the host machine, and since the docker is sharing the host system's IPC namespace, you get this error. I'm actually not sure --ipc=host
can be used in conjunction with the -p ...
directive (I'd start by using either the IPC or the ports mapping, but not both, and see how it works)