Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Everyone! I'M Encountering An Issue When Trying To Deploy An Endpoint For A Large-Sized Model Or Get Inference On A Large Dataset (Both Exceeding ~100Mb). It Seems That They Can Only Be Downloaded Up To About 100Mb. Is There A Way To Increase A Time

Hello everyone! I'm encountering an issue when trying to deploy an endpoint for a large-sized model or get inference on a large dataset (both exceeding ~100MB). It seems that they can only be downloaded up to about 100MB. Is there a way to increase a timeout variable somewhere to address this problem?

Here's an example log of downloading the dataset:

2024-04-02 21:26:29
2024-04-02 17:56:29,932 - clearml.storage - INFO - Downloading: 5.00MB / 483.18MB @ 2.94MBs from 

2024-04-02 21:26:31
2024-04-02 17:56:31,380 - clearml.storage - INFO - Downloading: 10.00MB / 483.18MB @ 3.45MBs from 

2024-04-02 21:26:32
2024-04-02 17:56:32,907 - clearml.storage - INFO - Downloading: 15.00MB / 483.18MB @ 3.27MBs from 

2024-04-02 21:26:34
2024-04-02 17:56:34,492 - clearml.storage - INFO - Downloading: 20.00MB / 483.18MB @ 3.16MBs from 

2024-04-02 21:26:35
2024-04-02 17:56:35,989 - clearml.storage - INFO - Downloading: 25.00MB / 483.18MB @ 3.34MBs from 

2024-04-02 21:26:37
2024-04-02 17:56:37,476 - clearml.storage - INFO - Downloading: 30.00MB / 483.18MB @ 3.36MBs from 

2024-04-02 21:26:39
2024-04-02 17:56:39,032 - clearml.storage - INFO - Downloading: 35.00MB / 483.18MB @ 3.21MBs from 

2024-04-02 21:26:40
2024-04-02 17:56:40,685 - clearml.storage - INFO - Downloading: 40.00MB / 483.18MB @ 3.03MBs from 

2024-04-02 21:26:42
2024-04-02 17:56:42,150 - clearml.storage - INFO - Downloading: 45.00MB / 483.18MB @ 3.41MBs from 

2024-04-02 21:26:43
2024-04-02 17:56:43,674 - clearml.storage - INFO - Downloading: 50.00MB / 483.18MB @ 3.28MBs from 

2024-04-02 21:26:45
2024-04-02 17:56:45,301 - clearml.storage - INFO - Downloading: 55.00MB / 483.18MB @ 3.07MBs from 

2024-04-02 21:26:46
2024-04-02 17:56:46,770 - clearml.storage - INFO - Downloading: 60.00MB / 483.18MB @ 3.40MBs from 

2024-04-02 21:26:48
2024-04-02 17:56:48,248 - clearml.storage - INFO - Downloading: 65.00MB / 483.18MB @ 3.38MBs from 

2024-04-02 21:26:49
2024-04-02 17:56:49,810 - clearml.storage - INFO - Downloading: 70.00MB / 483.18MB @ 3.20MBs from 

2024-04-02 21:26:51
2024-04-02 17:56:51,257 - clearml.storage - INFO - Downloading: 75.00MB / 483.18MB @ 3.46MBs from 

2024-04-02 21:26:52
2024-04-02 17:56:52,724 - clearml.storage - INFO - Downloading: 80.00MB / 483.18MB @ 3.41MBs from 

2024-04-02 21:26:54
2024-04-02 17:56:54,404 - clearml.storage - INFO - Downloading: 85.00MB / 483.18MB @ 2.98MBs from 

2024-04-02 21:26:55
2024-04-02 17:56:55,830 - clearml.storage - INFO - Downloading: 90.00MB / 483.18MB @ 3.51MBs from 

2024-04-02 21:26:57
2024-04-02 17:56:57,318 - clearml.storage - INFO - Downloading: 95.00MB / 483.18MB @ 3.36MBs from 

2024-04-02 21:26:58
2024-04-02 17:56:58,846 - clearml.storage - INFO - Downloading: 100.00MB / 483.18MB @ 3.27MBs from 

2024-04-02 17:56:59,679 - clearml.storage - INFO - Downloaded 100.75 MB successfully from 
 , saved to /root/.clearml/cache/storage_manager/datasets/d3609f172946c9c4bd22e31631bd42af.dataset.a09c036283be4cd7835d64ba874a212c.9qj9j2m_.zip
2024-04-02 17:56:59,681 - clearml - WARNING - Exception File is not a zip file
Failed extracting zip file /root/.clearml/cache/storage_manager/datasets/d3609f172946c9c4bd22e31631bd42af.dataset.a09c036283be4cd7835d64ba874a212c.9qj9j2m_.zip
  
  
Posted 28 days ago
Votes Newest

Answers 14


Oh...
None
try to add to your config file:

sdk.http.timeout.total = 300
  
  
Posted 28 days ago

Hi @<1671689437261598720:profile|FranticWhale40>
You mean the download just fails on the remote serving node becuause it takes too long to download the model?
(basically not a serving issue per-se but a download issue)

  
  
Posted 28 days ago

Yes exactly!

  
  
Posted 28 days ago

Thank you for your prompt response. As I installed ClearML using pip, I don't have direct access to the config file. Is there any other way to increase this timeout?

  
  
Posted 28 days ago

As I installed ClearML using pip,

Where is the clearml-serving runs ? usually your configuration file is in ~/clearml.conf
Notice if it is not there it means it is using the defaults so just create a new one and add that line

  
  
Posted 28 days ago

@<1523701205467926528:profile|AgitatedDove14> this file is not getting mounted when using the docker-compose file for the clearml-serving pipeline, do we also have to mount it somehow?

The only place I can see this file being used is in the README, like so:

Spin the inference container:

docker run -v ~/clearml.conf:/root/clearml.conf -p 8080:8080 -e CLEARML_SERVING_TASK_ID=<service_id> -e CLEARML_SERVING_POLL_FREQ=5 clearml-serving-inference:latest

  
  
Posted 28 days ago

using the docker-compose file for the

clearml-serving

pipeline, do we also have to mount it somehow?

oh yes, you are correct the values are passed using environment variables (easier when using docker compose)
You can in addition add a mount from the host machine to a conf file,

    volumes:
      - ${PWD}/clearml.conf:/root/clearml.conf

wdyt?

  
  
Posted 28 days ago

Seems like this still doesn’t solve the problem, how can we verify this setting has been applied correctly? Other than checking the clearml.conf file on the container that is

  
  
Posted 28 days ago

Yep, that makes sense. @<1671689437261598720:profile|FranticWhale40> plz give that a try

  
  
Posted 28 days ago

Or rather any pointers to debug the problem further? Our GCP instances have a pretty fast internet connection, and we haven’t faced that problem on those instances. It’s only on this specific local machine that we’re facing this truncated download.

I say truncated because we checked the model.onnx size on the container, and it was for example 110MB whereas the original one is around 160MB.

  
  
Posted 28 days ago

It’s only on this specific local machine that we’re facing this truncated download.

Yes that what the log says, make sense

Seems like this still doesn’t solve the problem, how can we verify this setting has been applied correctly?

hmm exec into the container? what did you put in clearml.conf?

  
  
Posted 28 days ago

in the clearml.conf we put this:

http {
  timeout {
     total: 300
  }
}

is that correct?

  
  
Posted 28 days ago

@<1523701205467926528:profile|AgitatedDove14> Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.

We didn’t end up needing the above configs after all.

  
  
Posted 28 days ago

Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.

Nice!
btw:

in the clearml.conf we put this:

for future reference, you are missing the sdk section:

sdk.http.timeout: 300

. notation works as well as {}

  
  
Posted 28 days ago
70 Views
14 Answers
28 days ago
28 days ago
Tags