Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

Hey hey, I having trouble with ClearML and ALBs in the AWS. Could someone help me? πŸ™‚

I am currently trying to deploy ClearML in the AWS. The Basic Infrastructure has an Application Load Balancer (ALB) and an Autoscaling Group that launches a ClearML AMI. I followed the instructions described in https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config/#sub-domains-and-load-balancers for setting up the ALB.

The steps I did are:
Edit /opt/clearml/config/apiserver.conf and added the domain of associated with my ALB via Route53 Create one HTTPS listener with host_header conditions ( app. , api. , files. ) that point to the respective target groups Created 3 HTTP Target Groups ( app. , api. , files. ) for the appropriate ports (8080, 8008, 8081) that target the ClearML server in the Autoscaling Group Double checked that the security groups allow access (web -> ALB and ALB -> Instance) Restarted the ClearML Server
Calling the app.<mydomain>.com (or api. or files. ) adress results in a 504 Gateway Time-out error after 10 seconds and all default health checks in the Target Groups are failing due to Request timed out .

Any idea how I could debug where the problem is? Thanks a lot πŸ™‚

  
  
Posted one year ago
Votes Newest

Answers 30


Currently I'm "cheating" and counting a 405 as the success code for the healthcheck.

  
  
Posted one year ago

Yes!

  
  
Posted one year ago

And it's still unhealthy. I am starting to suspect that somehow the Autoscaling Part in between the ALB and the ClearML server could be causing the problem.

  
  
Posted one year ago

These are the seetings for health check now

  
  
Posted one year ago

ops

  
  
Posted one year ago

look also at the monitoring tab

  
  
Posted one year ago

it can help debugging

  
  
Posted one year ago

the goal is to get healthchecks green so ALB should be able to work

  
  
Posted one year ago

it’s alongside health checks tab

  
  
Posted one year ago

API

  
  
Posted one year ago

ok, ty very much for your feedback πŸ˜„

  
  
Posted one year ago

But I still have one thing I'd like to fix: the health check for the file server on port 8081 gives me unhealthy for path "/". Is there a valid path you know I can use there for health checks? A curl gives me

  
  
Posted one year ago

Ok, I think that's been very helpful πŸ™‚ I'll experiment a little, now that I know a Health Check that must work. I'll write here if I find something! Thanks a lot for the awesome support!

  
  
Posted one year ago

from / to /debug.ping

  
  
Posted one year ago

This gives me a 200 πŸ™‚

  
  
Posted one year ago

And I could access the web server even if the health check was failing. So that was not a problem in the end.

  
  
Posted one year ago

In fact it's the same we are applying to helm charts for k8s

  
  
Posted one year ago

You are not cheating πŸ˜‚

  
  
Posted one year ago

can you change the path in ALB healthcheck pls?

  
  
Posted one year ago

usually you can see if you are getting timeouts or wrong http code

  
  
Posted one year ago

atm it’s the way to go

  
  
Posted one year ago

I'm going to ask an update to docs

  
  
Posted one year ago

Web Server

  
  
Posted one year ago

File Server

  
  
Posted one year ago

Can you pls share all 3 health checks ?

  
  
Posted one year ago

Just to be sure we are in sync 😁

  
  
Posted one year ago

in some second it should became green

  
  
Posted one year ago

JuicyFox94 I think I found the problem. To my absolute shame, the security group of the ALB had no Outbound rules, i.e. no traffic was allowed out of the ALB πŸ™ˆ . Now I can access the ClearML Webserver!

  
  
Posted one year ago

doubled copy paste

  
  
Posted one year ago

Thanks a lot for the help debugging!

  
  
Posted one year ago
497 Views
30 Answers
one year ago
one year ago
Tags
Similar posts