Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

Security question: in my journey of running ClearML the "hard way" (self-hosted), one problem I haven't solved is security.

Some discussion here...

  
  
Posted one year ago
Votes Newest

Answers 9


OOooh, excellent. So the file server isn't necessary at all if you're using some other object storage? That's slick!

Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer? For example, if ClearML is following OAuth 2.0, then the load balancer or API Gateway could reach out to it's "issuer URL" (probably available on the EC2 instance where ClearML is running) like this example here.
image

  
  
Posted one year ago

I'm imagining:

  • The EC2 instance would be in a private subnet, accessible only on the VPN (read: VPC)
  • The API Gateway and Load Balancer would also be on the VPC and therefore have access to the private subnet BUT the API Gateway or Load Balancer themselves would be exposed to the public internet.
    That way, to do the JWT authentication, the load balancer or API Gateway could reach out to the EC2 instance on the private network to authenticate any incoming ClearML SDK requests.
  
  
Posted one year ago

Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer?

Hmm in theory, but not in practice 😞

if ClearML is following OAuth 2.0, t

This is for the SSO part, not for the API, API is only using JWT for verification, the login process itself is with external SSO (OAuth 2.0). But the open-source version does not support SSO 😞

Why are you trying to add another ELB with JWT verification on it ? what are we trying to solve ?

  
  
Posted one year ago

If the load balancer it Gateway can do the computation and leverage caching,

Oh that's True. But unfortunately out of scope for the open-source (well at the end someone needs to pay our salaries πŸ™‚ )

I’d prefer not to have our EC2 instance directly exposed to the public Internet.

Yep, I tend to agree πŸ™‚

  
  
Posted one year ago

*or Gateway

  
  
Posted one year ago

If the load balancer it Gateway can do the computation and leverage caching, we’re much safer against DDOS attacks. In general, I’d prefer not to have our EC2 instance directly exposed to the public Internet.

  
  
Posted one year ago

Is there a way we can protect a ClearML deployment with a load balancer or API Gateway that is exposed to the whole world, but is protected by authentication so that only authorized clients can get in?

  
  
Posted one year ago

Hi @<1541954607595393024:profile|BattyCrocodile47>

But the files API is still open to the world, right?

No, of course not πŸ™‚ (i.e. API is authenticated with JWT header, this is why you need to generate the secret/key in the UI)
That said, the login process itself is user/pass stored on the server, but other than that the web/api are secured. The file server on the other hand is plain http storage and does not verify the connection like the API does. So if you are going the self-hosted open internet route, I would disable it altogether and use S3/GCP etc.

  
  
Posted one year ago

When you run the docker-compose.yml on an EC2 instance, you can configure user login for the ClearML webserver. But the files API is still open to the world, right? (and same with the backend?)

We could solve this by placing the EC2 instance into a VPN.

One disadvantage to that approach is it becomes annoying to reach the model registry from outside the VPN, like if you have a deployment pipeline based in GitHub Actions. Or if you wanted to trigger a ClearML pipeline from a VPC that isn't peered with the VPN's VPC. Fixing those issues adds complexity.

Something neat about MLFlow is that the SDK supports using various types of authentication.

For example, you can have the MLFlow client SDK use JWT tokens or even AWS SigV4 auth. What that means is you could stick a load balancer or API Gateway in front of MLFlow, shielding it from the whole world. If you need to access MLFlow using the SDK in something like GitHub Actions, you just set some environment variables, whether that's a JWT token or set of AWS Access Keys.

  
  
Posted one year ago
624 Views
9 Answers
one year ago
one year ago
Tags