@<1523701205467926528:profile|AgitatedDove14> this file is not getting mounted when using the docker-compose file for the clearml-serving
pipeline, do we also have to mount it somehow?
The only place I can see this file being used is in the README, like so:
Spin the inference container:
docker run -v ~/clearml.conf:/root/clearml.conf -p 8080:8080 -e CLEARML_SERVING_TASK_ID=<service_id> -e CLEARML_SERVING_POLL_FREQ=5 clearml-serving-inference:latest
Thank you for your prompt response. As I installed ClearML using pip, I don't have direct access to the config file. Is there any other way to increase this timeout?
using the docker-compose file for the
clearml-serving
pipeline, do we also have to mount it somehow?
oh yes, you are correct the values are passed using environment variables (easier when using docker compose)
You can in addition add a mount from the host machine to a conf file,
volumes:
- ${PWD}/clearml.conf:/root/clearml.conf
wdyt?
Or rather any pointers to debug the problem further? Our GCP instances have a pretty fast internet connection, and we haven’t faced that problem on those instances. It’s only on this specific local machine that we’re facing this truncated download.
I say truncated because we checked the model.onnx
size on the container, and it was for example 110MB whereas the original one is around 160MB.
It’s only on this specific local machine that we’re facing this truncated download.
Yes that what the log says, make sense
Seems like this still doesn’t solve the problem, how can we verify this setting has been applied correctly?
hmm exec into the container? what did you put in clearml.conf?
Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
Nice!
btw:
in the clearml.conf we put this:
for future reference, you are missing the sdk section:
sdk.http.timeout: 300
.
notation works as well as {}
@<1523701205467926528:profile|AgitatedDove14> Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
We didn’t end up needing the above configs after all.
As I installed ClearML using pip,
Where is the clearml-serving runs ? usually your configuration file is in ~/clearml.conf
Notice if it is not there it means it is using the defaults so just create a new one and add that line
in the clearml.conf we put this:
http {
timeout {
total: 300
}
}
is that correct?
Seems like this still doesn’t solve the problem, how can we verify this setting has been applied correctly? Other than checking the clearml.conf
file on the container that is
Hi @<1671689437261598720:profile|FranticWhale40>
You mean the download just fails on the remote serving node becuause it takes too long to download the model?
(basically not a serving issue per-se but a download issue)
Oh...
None
try to add to your config file:
sdk.http.timeout.total = 300
Yep, that makes sense. @<1671689437261598720:profile|FranticWhale40> plz give that a try