Reputation
Badges 1
24 × Eureka!Hi @<1523701070390366208:profile|CostlyOstrich36> Via the UI
@<1523701205467926528:profile|AgitatedDove14> Okay, thank you so much for your help!
What do you mean?
We added env vars to configuration vault and expected them to use in remote task with os.getenv
But it didn't work
Hi @<1523701205467926528:profile|AgitatedDove14>
https://github.com/allegroai/clearml-serving/issues/62
I have an issue basen on that case. Could you tell me if I miss something in it?
Hi @<1523701205467926528:profile|AgitatedDove14>
Are there any questions or updates about the issue?
@<1523701205467926528:profile|AgitatedDove14> this error appears before postprocess part.
Today I redeployed existing entrypoint with --aux-config "./config.pbtxt"
and get the same error
Before:
!clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --input-size '[1, 640]' '[640, 1]' --input-name 'encoder_outputs' 'decoder_outputs' --input-type float32 float32 --output-size '[129]' --output-name 'outpu...
I checked env vars in remote task python env and there were no vars mentioned in vault
@<1523701205467926528:profile|AgitatedDove14> I think the question is how we can override 40 pipeline parameters without using UI window, but with UI pipeline start
@<1523701118159294464:profile|ExasperatedCrab78>
We use variables from .env file inside the clearm-serving-triton
image, because we use helm chart to clearml-serving spinning. And still face the error
Hi @<1523701205467926528:profile|AgitatedDove14>
My preprocess file:
from typing import Any, Union, Optional, Callable
class Preprocess(object):
def init(self):
pass
def preprocess(
self,
body: Union[bytes, dict],
state: dict,
collect_custom_statistics_fn: Optional[Callable[[dict], None]]
) -> Any:
return body["length"], body["audio_signal"]
def postprocess(
self,
data: An...
Hi @<1523701205467926528:profile|AgitatedDove14>
Yes, We use configuration vault for that
@<1523701070390366208:profile|CostlyOstrich36>
I spin endpoint with:
`!clearml-serving --id "<>" model add --engine triton --endpoint 'modelname' --model-id '<>' --preprocess 'preprocess.py' --input-size '[-1, -1]' '[-1, -1]' '[-1, -1]' --input-name 'input_ids' 'token_type_ids' 'attention_mask' --input-type int64 int64 int64 --output-size '[-1, -1]' --output-name 'logits' --output-type float32 --aux-config name="modelname" platform="onnxruntime_onnx" default_model_filename="model.bin...
"After" version in logs is the same as config above. There is no "before" version in logs((
Endpoint config from ClearML triton task:
conformer_joint {
engine_type = "triton"
serving_url = "conformer_joint"
model_id = "<>"
version = ""
preprocess_artifact = "py_code_conformer_joint"
auxiliary_cfg = """default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
max_queue_delay_microseconds: 100
}
input: [
{
name: "encoder_outputs"
...
@<1523701205467926528:profile|AgitatedDove14> Yes, I have some Logger.current_logger()
callings in model class.
If I turn off logging on non-master nodes with RANK checking, I won't loose training logs from non-master nodes (I mean all training logs are on master node, aren't they) ?
@<1523701205467926528:profile|AgitatedDove14> config.pbtxt in triton container (inside /models/conformer_joint) - after merge:
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
max_queue_delay_microseconds: 100
}
input: [
{
name: "encoder_outputs"
data_type: TYPE_FP32
dims: [
1,
640
]
},
{
name: "decoder_outputs"
data_type: TYPE_FP3...
@<1523701205467926528:profile|AgitatedDove14> I think there is no chance to pass config.pbtxt as is.
In this line, function use self.model_endpoint.input_name
(and after that input_name
, input_type
and input_size
), but there are no such att...
Hi @<1523701087100473344:profile|SuccessfulKoala55> Turns out if I delete
platform: ...
string from config.pbtxt, it will deploy model on tritonserver (serving v 1.3.0 add "platform" string at the end of config file when clearm-model has "framework" attribute). But when I try to check endpoint with random data (but with right shape according config), I am getting
{'detail': "Error processing request: object of type 'NoneType' has no len()"}
error. Do you know how...
Hi John @<1569133676640342016:profile|MammothPigeon75> ! How you queue up a slurm jobs for task with distributed calculations (like Pytorch Lightning)?
Please give me some help
Thank you in advance!
I am getting this error in request response:
import numpy as np
import requests
body={
"encoder_outputs": [np.random.randn(1, 640).tolist()],
"decoder_outputs": [np.random.randn(640, 1).tolist()]
}
response =
(f"
", json=body)
response.json()
Unfortunately, I see nothing related to this problem in both inference and triton pods /deployments (we use Kubernetes to spin ClearML-serving
@<1523701205467926528:profile|AgitatedDove14> in this case I get AttributeError: 'NoneType' object has no attribute 'report_scalar'
on trainer.fit(...)
And Logger.current_logger()
- I think non-master processes trying to log something, but have no Logger instance because have no Task instance.
What am I suppose to do to log training correctly? Logs in master process include all training history or I need to concatenate logs from different nodes somehow?
Hi @<1523701070390366208:profile|CostlyOstrich36> Yes
Or run clearml-serving python code without CLI wrapper
@<1523701118159294464:profile|ExasperatedCrab78> Thank you! It have solved the problem!
UPD: If I use --ntask-per-node=2
then ClearML creates 2 tasks, but I need only 1.