So the problem came back even with this new URL. I discovered clearing your cookies fixes it.
It seems you have a specific workflow in mind, but I'm not sure I follow it. Can you give a specific example ?
Absolutely. So, let's say a DS tags a model in ClearML with "release candidate". It'd be great to have that trigger a number of processes, each with their own retry logic:
- A fairness/bias evaluation, potentially as a task in ClearML itself. This would load the model and run some sample datasets through it. The
- Pipeline to prepare for deployment. Trigger a GitHub Actions ...
I'm imagining:
- The EC2 instance would be in a private subnet, accessible only on the VPN (read: VPC)
- The API Gateway and Load Balancer would also be on the VPC and therefore have access to the private subnet BUT the API Gateway or Load Balancer themselves would be exposed to the public internet.
That way, to do the JWT authentication, the load balancer or API Gateway could reach out to the EC2 instance on the private network to authenticate any incoming ClearML SDK requests.
I've also used Airflow and Dagster in prod, but not integrated them with an exp tracker.
That's fabulous. This is definitely how my team prefers to structure projects. I hadn't gotten around to trying that out in our POC of ClearML yet, but I'm certain this is how our group will solve this problem
How it works / what we finished:
- We used the SaaS ClearML, started an EC2 instance, and manually installed and ran the
clearml-agent daemon
on it - We ran
clearml-init
on our laptops to generate theclearml.conf
file. - The extension is in TypeScript, so...
- We started trying to write code with the Python SDK to list sessions, but realized calling that from the extension would be hard, so we opted to have the TypeScript code make calls to the ClearML API server directly, e.g. ...
If the load balancer it Gateway can do the computation and leverage caching, weโre much safer against DDOS attacks. In general, Iโd prefer not to have our EC2 instance directly exposed to the public Internet.
Yay! Man, I want to do ClearML with "hard mode" (non-enterprise, self-hosted) first, before trying to sell BENlabs (my work) on it. I could see us paying for enterprise to get the Hyper Datasets and Vault features if our scientists/developers fall in love with it--they probably will if we can get them to adopt it since right now we have a homemade system that isn't nearly as nice as ClearML.
@<1523701087100473344:profile|SuccessfulKoala55> how exactly do you configure ClearML to use the cr...
Is there a way we can protect a ClearML deployment with a load balancer or API Gateway that is exposed to the whole world, but is protected by authentication so that only authorized clients can get in?
Thank you! For now, it's kind of nice that it just picks up your credentials from your conf file. No extra setup required beyond the onboarding ClearML has you do ๐
And look! It's working, assuming you start the clearml session up yourself:
Totally worked!
I've also tried running a clearml-agent daemon
directly on my mac (not in docker) serving the sessions
queue for the ClearML server that is running in docker. When I do that, it consistently fails with a different error. Something to do with mounting a volume.
Hey, thanks for responding!
Does there happen to be ClearML auto-logging... for MLFlow? That would make it super easy for us to migrate our existing training/batch inference jobs to ClearML ๐
Will do!
Thanks for this!! I may try it and if I do and it works Iโll look into writing a plugin for ZenML and Metaflow that auto initializes the parent task and registers the steps as child tasks. Super helpful thank you!
Haha, that was a total gotcha for me. Yeah, a lot just wasn't even getting run due to the #!/bin/bash
part.
Anyway, wow! I finally got the precious console logs you thought to find, here they are:
2023-05-06 00:19:21
User aborted: stopping task (3)
2023-05-06 00:19:21
Successfully installed PyYAML-6.0 attrs-22.2.0 certifi-2022.12.7 charset-normalizer-3.1.0 clearml-agent-1.5.2 distlib-0.3.6 filelock-3.12.0 furl-2.1.3 idna-3.4 jsonschema-4.17.3 orderedmultidict-1.0.1 pathlib2-2.3.7....
When you run the docker-compose.yml
on an EC2 instance, you can configure user login for the ClearML webserver. But the files API is still open to the world, right? (and same with the backend?)
We could solve this by placing the EC2 instance into a VPN.
One disadvantage to that approach is it becomes annoying to reach the model registry from outside the VPN, like if you have a deployment pipeline based in GitHub Actions. Or if you wanted to trigger a ClearML pipeline from a VPC that isn...
Here's the repo: I've recorded a few update videos documenting how we learned about authoring VS Code extensions and how we got it to it's current state. Linked to those in order in the README.
ChatGPT has made working with TypeScript and the VSCode extension framework really nice! None
Oh hooray! So docker-compose manages the restarting of crashed containers? I didn't know that, and that is great ๐
The agent commands are nothing special.
clearml-agent daemon --queue sessions --cpu-only --create-queue true --docker
Oh, right... the Docker image running on the instance takes care of the library versions. You guys are great!
It's an Amazon Linux AMI with the AWS CLI pre-installed on it. It uses the AWS CLI to fetch the key from AWS SSM Parameter Store. It's granted read access to that SSM Parameter via the instance role.
Oh my goodness. Thank you! I'd seen that before, but for some reason it didn't register I could run that with VS Code...
But this config should almost never need to change!
Host clearml-session
HostName localhost
User root
Port 8022
Trying as a python subprocess...
And for the session
clearml-session --queue sessions --docker python:3.9
Hmm... these people are recommending restarting docker completely. I may have tried that already, but I'll do it again when I get some time to be sure.
While I'm wishing for things: it'd be awesome if it had a queue already set up. But if there's not a way to do that in the docker compose file, I could potentially write a script that uses the creds to create one using API calls
OOooh, excellent. So the file server isn't necessary at all if you're using some other object storage? That's slick!
Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer? For example, if ClearML is following OAuth 2.0, then the load balancer or API Gateway could reach out to it's "issuer URL" (probably available on the EC2 instance where ClearML is running) like this example here.
![image](https://clearml-web-assets.s3.amazonaws.c...