BrightRabbit75

Moderator

2 Questions, 28 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

27 × Eureka!

Questions 2
Answers 28

0 Votes

21 Answers

1K Views

0 Votes 21 Answers 1K Views

I Have An On-Prem/Free Clearml-Server Setup With Custom S3 Back-End Storage. I'M Trying Out The Clearml-Serving Capability And Not Sure What'S Failing. When I Start The Serving Containers It Can'T Retrieve The Model:

I have an on-prem/free clearml-server setup with custom S3 back-end storage. I'm trying out the clearml-serving capability and not sure what's failing. When ...

clearml

2 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi Clearml Team! I'M Trying To Get S3 Storage Set Up For My Self-Hosted Server Using Our Enterprise Storage-Grid Back-End And Getting A Weird Error Response When Trying To Upload An Image (This Is The First Test Of S3 Storage For Me So Nothing Is Working

Hi ClearML team! I'm trying to get S3 storage set up for my self-hosted server using our Enterprise Storage-Grid back-end and getting a weird error response ...

clearml

2 years ago

0 I Have An On-Prem/Free Clearml-Server Setup With Custom S3 Back-End Storage. I'M Trying Out The Clearml-Serving Capability And Not Sure What'S Failing. When I Start The Serving Containers It Can'T Retrieve The Model:

In clearml.conf:
files_server: " "

2 years ago

AgitatedDove14 - fantastic support! I was indeed missing the output_uri, I evidently commented it out with a "FIXME - why is this here?" comment. So now I see the model on the S3 server and the Web UI properly shows its path:
...I've removed the model from the serving instance to start fresh and the clearml-serving docker containers all come up happy. However, when I clearml-serving model add ... it is using the wrong URL - an https:// instead of an s3:// so it can't upload the ...

2 years ago

Ah, that's what I'm missing. Will test tomorrow. I should have started with the example instead of my existing experiment. Thanks AgitatedDove14 !

2 years ago

I've added that to the example.env. Same creds/etc from the clearml.conf and I can see the metrics/artifacts on the S3 server. I can't find any actual model files on the server though. However, I swear it worked once.

Next time online I'll attach to the containers and verify the AWS creds are present. I guess there's a way to request the model from the cli/python env that I can test as well.

2 years ago

I'm now stuck on the actual request. I took a guess on the triton config with input/output names/etc. so I think I'm doing something wrong there. I can't seem to figure out what the names should be from the pytorch example - where did INPUT__0 come from?

2 years ago

All this mess was caused by a docker bridge IP subnet clash with our VPN subnet. 😞
Thanks for all the help AgitatedDove14 !

2 years ago

The model url (shown above) looks invalid:
file:///tmp/tmpl_bxlmu2/model/data/model.pth

I was expecting something like s3://...

2 years ago

Also - I don't think I'm saving the model correctly. It appears I need to converted into TorchScript?

2 years ago

No, it's a real S3 server that we have on-prem.
` aws {
s3 {
region: ""
key: ""
secret: ""
use_credentials_chain: false

        credentials: [
            {
                host: "e3-storage-grid-gateway.aero.org:443"
                key: "****"
                secret: "****"
                multipart: false
                secure: true
            }
        ]
    }
    boto3 {
      ...

2 years ago

Yes - the files_server is set to s3://... and the credentials are in the clearml.conf in the sdk/aws/s3 section. I'm trying to debug my way through the code to see where it fails and see if I can tell what's wrong.

2 years ago

I'm having trouble just verifying the bucket via boto3 directly so something else might be amiss causing this issue.

2 years ago

0 Hi All, We Have Just Set Up The Clearml Server On An On-Prem Server And We Can Successfully Access Its Apis When We Are On-Prem From Pcs Within Our Network. However, When We Try To Access The Webapi From Remote Through The Vpn We Fail. The Vpn Logs Don'T

Check the subnets of your VPN machines and the clearml docker subnet. I've had issues where the VPN uses 172.* which matches the Docker bridge network so all responses from the Docker containers get routed internally and dumped because of it. i.e. They don't go back out to non-bridge network and back to the VPN machines.

2 years ago

0 Hi Clearml Team! I'M Trying To Get S3 Storage Set Up For My Self-Hosted Server Using Our Enterprise Storage-Grid Back-End And Getting A Weird Error Response When Trying To Upload An Image (This Is The First Test Of S3 Storage For Me So Nothing Is Working

Is that because I didn't list the bucket name in the clearml.conf?

2 years ago

output_uri= `

2 years ago

Great! Now to tell our IT that I need more space on S3 🙂

2 years ago

Sure

2 years ago

Ok - it's the URL in the files_server that was wrong. It needs to be s3 and not https.

2 years ago

OK - I can try to hack that

2 years ago

Nope - bucket_name in clearml.conf didn't work. Maybe default_uri somewhere?

2 years ago

No worries. I probably should have revisited the examples. Too much cutting/pasting on my part. Thanks so much for helping!

2 years ago

Also - I'm not specifying the URI when I create the Task

2 years ago

I added secure and region - didn't change the behavior.

2 years ago

Now to get clearml-data to use S3... 🙂

2 years ago

Same response. Should I change that in the fileserver section too?

2 years ago

boto3 w/o region still worked

2 years ago

You mean no https?

2 years ago

Wait - adding the output_uri seems to work.

2 years ago

clearml-serving --id ${SERVE_TASK_ID} model add --engine triton --preprocess "preprocess.py" --endpoint "deep_satcom_test" --project "DeepSatcom" --published --input-size 4 128 1 --input-name INPUT__0 --input-type float32 --output-size 2 --output-name OUTPUT__0 --output-type float32

2 years ago