Hi @<1523701205467926528:profile|AgitatedDove14> .,
Of course! The output of curl -X POST
command is at least reassuring, it shows that the automatic endpoint works. As you say, the RPC error when sending request seems to be returned from the GPU backend.
Nothing gets printed in docker compose log when sending the curl -X POST
, but beforehand following log is displayed for clearml-serving-triton
container with among others WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
, skipping model configuration auto-complete for 'test_model_pytorch_auto_1': not supported for pytorch backend
and Inference Mode is disabled for model instance 'test_model_pytorch_auto_1'
:
clearml-serving-triton | W1123 14:37:08.885296 53 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
clearml-serving-triton | I1123 14:37:08.885337 53 cuda_memory_manager.cc:115] CUDA memory pool disabled
clearml-serving-triton | I1123 14:37:08.886498 53 model_lifecycle.cc:459] loading: test_model_pytorch_auto_1:1
clearml-serving-triton | WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
clearml-serving-triton | I1123 14:37:09.081600 53 libtorch.cc:1983] TRITONBACKEND_Initialize: pytorch
clearml-serving-triton | I1123 14:37:09.081607 53 libtorch.cc:1993] Triton TRITONBACKEND API version: 1.10
clearml-serving-triton | I1123 14:37:09.081609 53 libtorch.cc:1999] 'pytorch' TRITONBACKEND API version: 1.10
clearml-serving-triton | I1123 14:37:09.081618 53 libtorch.cc:2032] TRITONBACKEND_ModelInitialize: test_model_pytorch_auto_1 (version 1)
clearml-serving-triton | W1123 14:37:09.081897 53 libtorch.cc:284] skipping model configuration auto-complete for 'test_model_pytorch_auto_1': not supported for pytorch backend
clearml-serving-triton | I1123 14:37:09.082100 53 libtorch.cc:313] Optimized execution is enabled for model instance 'test_model_pytorch_auto_1'
clearml-serving-triton | I1123 14:37:09.082103 53 libtorch.cc:332] Cache Cleaning is disabled for model instance 'test_model_pytorch_auto_1'
clearml-serving-triton | I1123 14:37:09.082104 53 libtorch.cc:349] Inference Mode is disabled for model instance 'test_model_pytorch_auto_1'
clearml-serving-triton | I1123 14:37:09.082106 53 libtorch.cc:444] NvFuser is not specified for model instance 'test_model_pytorch_auto_1'
clearml-serving-triton | I1123 14:37:09.082126 53 libtorch.cc:2076] TRITONBACKEND_ModelInstanceInitialize: test_model_pytorch_auto_1 (CPU device 0)
clearml-serving-triton | I1123 14:37:09.091582 53 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch_auto_1' version 1
clearml-serving-triton | I1123 14:37:09.091667 53 server.cc:561]
clearml-serving-triton | +------------------+------+
clearml-serving-triton | | Repository Agent | Path |
clearml-serving-triton | +------------------+------+
clearml-serving-triton | +------------------+------+
clearml-serving-triton |
clearml-serving-triton | I1123 14:37:09.091684 53 server.cc:588]
clearml-serving-triton | +---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
clearml-serving-triton | | Backend | Path | Config |
clearml-serving-triton | +---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
clearml-serving-triton | | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
clearml-serving-triton | +---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
clearml-serving-triton |
clearml-serving-triton | I1123 14:37:09.091701 53 server.cc:631]
clearml-serving-triton | +---------------------------+---------+--------+
clearml-serving-triton | | Model | Version | Status |
clearml-serving-triton | +---------------------------+---------+--------+
clearml-serving-triton | | test_model_pytorch_auto_1 | 1 | READY |
clearml-serving-triton | +---------------------------+---------+--------+
clearml-serving-triton |
clearml-serving-triton | Error: Failed to initialize NVML
clearml-serving-triton | W1123 14:37:09.092288 53 metrics.cc:571] DCGM unable to start: DCGM initialization error
clearml-serving-triton | I1123 14:37:09.092338 53 tritonserver.cc:2214]
clearml-serving-triton | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
clearml-serving-triton | | Option | Value |
clearml-serving-triton | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
clearml-serving-triton | | server_id | triton |
clearml-serving-triton | | server_version | 2.25.0 |
clearml-serving-triton | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
clearml-serving-triton | | model_repository_path[0] | /models |
clearml-serving-triton | | model_control_mode | MODE_POLL |
clearml-serving-triton | | strict_model_config | 0 |
clearml-serving-triton | | rate_limit | OFF |
clearml-serving-triton | | pinned_memory_pool_byte_size | 268435456 |
clearml-serving-triton | | response_cache_byte_size | 0 |
clearml-serving-triton | | min_supported_compute_capability | 6.0 |
clearml-serving-triton | | strict_readiness | 1 |
clearml-serving-triton | | exit_timeout | 30 |
clearml-serving-triton | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
clearml-serving-triton |
clearml-serving-triton | I1123 14:37:09.092982 53 grpc_server.cc:4610] Started GRPCInferenceService at 0.0.0.0:8001
clearml-serving-triton | I1123 14:37:09.093096 53 http_server.cc:3316] Started HTTPService at 0.0.0.0:8000
clearml-serving-triton | I1123 14:37:09.133947 53 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
Additionally, you can find attached the full docker compose log from the moment I entered docker-compose --env-file example.env -f docker-compose-triton.yml --verbose up
⤵ .
I am not really sure why this happens, maybe this is related to my GPU ( nvidia-smi -L
indicates I have an NVIDIA GeForce RTX 4070 Ti
:man-shrugging: ).
Thank you again for your precious insight! 😉
Best regards.