Reputation
Badges 1
24 × Eureka!@<1523701205467926528:profile|AgitatedDove14> config.pbtxt in triton container (inside /models/conformer_joint) - after merge:
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
max_queue_delay_microseconds: 100
}
input: [
{
name: "encoder_outputs"
data_type: TYPE_FP32
dims: [
1,
640
]
},
{
name: "decoder_outputs"
data_type: TYPE_FP3...
@<1523701205467926528:profile|AgitatedDove14> this error appears before postprocess part.
Today I redeployed existing entrypoint with --aux-config "./config.pbtxt"
and get the same error
Before:
!clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --input-size '[1, 640]' '[640, 1]' --input-name 'encoder_outputs' 'decoder_outputs' --input-type float32 float32 --output-size '[129]' --output-name 'outpu...
@<1523701205467926528:profile|AgitatedDove14> I think there is no chance to pass config.pbtxt as is.
In this line, function use self.model_endpoint.input_name
(and after that input_name
, input_type
and input_size
), but there are no such att...
Hi @<1523701087100473344:profile|SuccessfulKoala55> Turns out if I delete
platform: ...
string from config.pbtxt, it will deploy model on tritonserver (serving v 1.3.0 add "platform" string at the end of config file when clearm-model has "framework" attribute). But when I try to check endpoint with random data (but with right shape according config), I am getting
{'detail': "Error processing request: object of type 'NoneType' has no len()"}
error. Do you know how...
I am getting this error in request response:
import numpy as np
import requests
body={
"encoder_outputs": [np.random.randn(1, 640).tolist()],
"decoder_outputs": [np.random.randn(640, 1).tolist()]
}
response =
(f"
", json=body)
response.json()
Unfortunately, I see nothing related to this problem in both inference and triton pods /deployments (we use Kubernetes to spin ClearML-serving
Hi @<1523701205467926528:profile|AgitatedDove14>
https://github.com/allegroai/clearml-serving/issues/62
I have an issue basen on that case. Could you tell me if I miss something in it?
Hi @<1523701205467926528:profile|AgitatedDove14>
My preprocess file:
from typing import Any, Union, Optional, Callable
class Preprocess(object):
def init(self):
pass
def preprocess(
self,
body: Union[bytes, dict],
state: dict,
collect_custom_statistics_fn: Optional[Callable[[dict], None]]
) -> Any:
return body["length"], body["audio_signal"]
def postprocess(
self,
data: An...
Hi @<1523701205467926528:profile|AgitatedDove14>
Are there any questions or updates about the issue?
Hi @<1523701070390366208:profile|CostlyOstrich36> Via the UI
@<1523701118159294464:profile|ExasperatedCrab78>
We use variables from .env file inside the clearm-serving-triton
image, because we use helm chart to clearml-serving spinning. And still face the error
@<1523701118159294464:profile|ExasperatedCrab78> Thank you! It have solved the problem!
Hi John @<1569133676640342016:profile|MammothPigeon75> ! How you queue up a slurm jobs for task with distributed calculations (like Pytorch Lightning)?
Please give me some help
Thank you in advance!
What do you mean?
We added env vars to configuration vault and expected them to use in remote task with os.getenv
But it didn't work
Hi @<1523701205467926528:profile|AgitatedDove14>
Yes, We use configuration vault for that
I checked env vars in remote task python env and there were no vars mentioned in vault
"After" version in logs is the same as config above. There is no "before" version in logs((
Endpoint config from ClearML triton task:
conformer_joint {
engine_type = "triton"
serving_url = "conformer_joint"
model_id = "<>"
version = ""
preprocess_artifact = "py_code_conformer_joint"
auxiliary_cfg = """default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
max_queue_delay_microseconds: 100
}
input: [
{
name: "encoder_outputs"
...
@<1523701070390366208:profile|CostlyOstrich36>
I spin endpoint with:
`!clearml-serving --id "<>" model add --engine triton --endpoint 'modelname' --model-id '<>' --preprocess 'preprocess.py' --input-size '[-1, -1]' '[-1, -1]' '[-1, -1]' --input-name 'input_ids' 'token_type_ids' 'attention_mask' --input-type int64 int64 int64 --output-size '[-1, -1]' --output-name 'logits' --output-type float32 --aux-config name="modelname" platform="onnxruntime_onnx" default_model_filename="model.bin...
Hi @<1523701070390366208:profile|CostlyOstrich36> Yes
Or run clearml-serving python code without CLI wrapper
@<1523701205467926528:profile|AgitatedDove14> I think the question is how we can override 40 pipeline parameters without using UI window, but with UI pipeline start
@<1523701205467926528:profile|AgitatedDove14> in this case I get AttributeError: 'NoneType' object has no attribute 'report_scalar'
on trainer.fit(...)
And Logger.current_logger()
- I think non-master processes trying to log something, but have no Logger instance because have no Task instance.
What am I suppose to do to log training correctly? Logs in master process include all training history or I need to concatenate logs from different nodes somehow?
@<1523701205467926528:profile|AgitatedDove14> Yes, I have some Logger.current_logger()
callings in model class.
If I turn off logging on non-master nodes with RANK checking, I won't loose training logs from non-master nodes (I mean all training logs are on master node, aren't they) ?
UPD: If I use --ntask-per-node=2
then ClearML creates 2 tasks, but I need only 1.
@<1523701205467926528:profile|AgitatedDove14> Okay, thank you so much for your help!