Reputation
Badges 1
24 × Eureka!@<1523701205467926528:profile|AgitatedDove14> this error appears before postprocess part.
Today I redeployed existing entrypoint with --aux-config "./config.pbtxt"
and get the same error
Before:
!clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --input-size '[1, 640]' '[640, 1]' --input-name 'encoder_outputs' 'decoder_outputs' --input-type float32 float32 --output-size '[129]' --output-name 'outpu...
@<1523701118159294464:profile|ExasperatedCrab78>
We use variables from .env file inside the clearm-serving-triton
image, because we use helm chart to clearml-serving spinning. And still face the error
@<1523701118159294464:profile|ExasperatedCrab78> Thank you! It have solved the problem!
@<1523701205467926528:profile|AgitatedDove14> in this case I get AttributeError: 'NoneType' object has no attribute 'report_scalar'
on trainer.fit(...)
And Logger.current_logger()
- I think non-master processes trying to log something, but have no Logger instance because have no Task instance.
What am I suppose to do to log training correctly? Logs in master process include all training history or I need to concatenate logs from different nodes somehow?
@<1523701205467926528:profile|AgitatedDove14> Yes, I have some Logger.current_logger()
callings in model class.
If I turn off logging on non-master nodes with RANK checking, I won't loose training logs from non-master nodes (I mean all training logs are on master node, aren't they) ?
@<1523701205467926528:profile|AgitatedDove14> I think there is no chance to pass config.pbtxt as is.
In this line, function use self.model_endpoint.input_name
(and after that input_name
, input_type
and input_size
), but there are no such att...
Hi @<1523701205467926528:profile|AgitatedDove14>
https://github.com/allegroai/clearml-serving/issues/62
I have an issue basen on that case. Could you tell me if I miss something in it?
UPD: If I use --ntask-per-node=2
then ClearML creates 2 tasks, but I need only 1.
I am getting this error in request response:
import numpy as np
import requests
body={
"encoder_outputs": [np.random.randn(1, 640).tolist()],
"decoder_outputs": [np.random.randn(640, 1).tolist()]
}
response =
(f"
", json=body)
response.json()
Unfortunately, I see nothing related to this problem in both inference and triton pods /deployments (we use Kubernetes to spin ClearML-serving
"After" version in logs is the same as config above. There is no "before" version in logs((
Endpoint config from ClearML triton task:
conformer_joint {
engine_type = "triton"
serving_url = "conformer_joint"
model_id = "<>"
version = ""
preprocess_artifact = "py_code_conformer_joint"
auxiliary_cfg = """default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
max_queue_delay_microseconds: 100
}
input: [
{
name: "encoder_outputs"
...
Hi @<1523701205467926528:profile|AgitatedDove14>
Are there any questions or updates about the issue?
@<1523701205467926528:profile|AgitatedDove14> config.pbtxt in triton container (inside /models/conformer_joint) - after merge:
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
max_queue_delay_microseconds: 100
}
input: [
{
name: "encoder_outputs"
data_type: TYPE_FP32
dims: [
1,
640
]
},
{
name: "decoder_outputs"
data_type: TYPE_FP3...
@<1523701205467926528:profile|AgitatedDove14> I think the question is how we can override 40 pipeline parameters without using UI window, but with UI pipeline start
@<1523701205467926528:profile|AgitatedDove14> Okay, thank you so much for your help!
I checked env vars in remote task python env and there were no vars mentioned in vault
@<1523701070390366208:profile|CostlyOstrich36>
I spin endpoint with:
`!clearml-serving --id "<>" model add --engine triton --endpoint 'modelname' --model-id '<>' --preprocess 'preprocess.py' --input-size '[-1, -1]' '[-1, -1]' '[-1, -1]' --input-name 'input_ids' 'token_type_ids' 'attention_mask' --input-type int64 int64 int64 --output-size '[-1, -1]' --output-name 'logits' --output-type float32 --aux-config name="modelname" platform="onnxruntime_onnx" default_model_filename="model.bin...
Hi @<1523701070390366208:profile|CostlyOstrich36> Yes
Or run clearml-serving python code without CLI wrapper
Hi @<1523701205467926528:profile|AgitatedDove14>
My preprocess file:
from typing import Any, Union, Optional, Callable
class Preprocess(object):
def init(self):
pass
def preprocess(
self,
body: Union[bytes, dict],
state: dict,
collect_custom_statistics_fn: Optional[Callable[[dict], None]]
) -> Any:
return body["length"], body["audio_signal"]
def postprocess(
self,
data: An...
Hi @<1523701087100473344:profile|SuccessfulKoala55> Turns out if I delete
platform: ...
string from config.pbtxt, it will deploy model on tritonserver (serving v 1.3.0 add "platform" string at the end of config file when clearm-model has "framework" attribute). But when I try to check endpoint with random data (but with right shape according config), I am getting
{'detail': "Error processing request: object of type 'NoneType' has no len()"}
error. Do you know how...
Hi @<1523701205467926528:profile|AgitatedDove14>
Yes, We use configuration vault for that
Hi John @<1569133676640342016:profile|MammothPigeon75> ! How you queue up a slurm jobs for task with distributed calculations (like Pytorch Lightning)?
Please give me some help
Thank you in advance!
Hi @<1523701070390366208:profile|CostlyOstrich36> Via the UI
What do you mean?
We added env vars to configuration vault and expected them to use in remote task with os.getenv
But it didn't work