BattyCrocodile47

Moderator

35 Questions, 147 Answers

Active since 02 March 2023

Last activity 2 months ago

Reputation

Badges 1

129 × Eureka!

Questions 35
Answers 147

0 Votes

4 Answers

974 Views

0 Votes 4 Answers 974 Views

Hey

Hey AgitatedDove14 ! Don't know if you're up but we're working on the VS Code extension at the hackathon rn!

clearml

one year ago

0 Votes

19 Answers

1K Views

0 Votes 19 Answers 1K Views

Hey

Hey AgitatedDove14 , I saw this SO answer you gave about ClearML's docker-compose.yaml . You described getting a secret key pa...

clearml

one year ago

0 Votes

21 Answers

1K Views

0 Votes 21 Answers 1K Views

Crazy Idea:

Crazy idea: what if ClearML had a VS Code extension? It could help you start and join ClearML sessions! It could use your local ~/clearml.conf file for read ...

clearml

one year ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

I’M

I’m working on an automated deployment of ClearML with IaC. I’ve got a script to start an EC2 instance that runs the docker compose file. Separately, I’ve go...

mlops

one year ago

0 Votes

34 Answers

115K Views

0 Votes 34 Answers 115K Views

My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

My autoscaled instance fails when running "git clone" on a private repo. I do have the SSH key placed at /root/.ssh/id_rsa on the machine, and when I SSH int...

mlops

one year ago

Show more results questions

0 Hello, Is There A Dark Theme For Clearml Ui ?

I use the Dark Reader chrome extension 😆

one year ago

0 Can You Help Me Make The Case For Clearml Pipelines/Tasks Vs Metaflow? Context Within...

Oh this is thought provoking. Yeah, the idea of using ClearML for R&D is super appealing (to me speaking as an MLOps engineer 😆 ). And having the power of Metaflow's scheduler (on Step Functions with Event Bridge since we'd do the AWS-native deployment) also makes sense to me.

I'll keep asking questions about how we could do event-based jobs with alerting built in on ClearML in a different thread later on.

I pasted your points (anonymously) onto the Metaflow slack to le...

one year ago

0 Hey

I ultimately resorted to creating a selenium script combined with docker-compose. Not a beautiful solution but I can confirm that it works 😕 None

one year ago

0 Another Aws Autoscaler Question. The

Sorry, clarifying:

The agent-services entry in the docker-compose file seems to add a single worker to the services queue

one year ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

*or Gateway

2 years ago

I see. Is it possible for two agents to be utilizing the same GPU? (like if the machine has a terrific GPU, but only one of them?)

2 years ago

0 Crazy Idea:

I did a post on Linkedin with several slides on how I plan to build it here

one year ago

0 Crazy Idea:

This is the event: None

one year ago

Yes, it's pretty lame that a clearml-agent can only process one task at a time if it's not listening to a services queue 🤔

2 years ago

0 After Presenting Clearml To My Team, I Got The Question "We'Re Already On Aws, Why Not Use Sagemaker?" Tbh, I'Ve Never Gone Through The Ml Workflow With Sagemaker. The Only Advantage I Could Think Of Is That We Can Use Our On-Prem Machines For Training,

@<1523701205467926528:profile|AgitatedDove14> you beautiful person, this is terrific! I do believe SageMaker has some nice monitoring/data drift capabilities that seem interesting, but these points you have here will be a fantastic starting point for my team's analysis of the products. I think this will help balance some of the over-enthusiasm towards using the native AWS solution.

2 years ago

0 Another Aws Autoscaler Question. The

At the time that I run python aws_autoscaler.py --remote , that clearml-services worker is the only worker on the services queue. So it will be the worker that picks up the autoscaler task.

But the task seems to be failing on startup due to the CLEARML_API_HOST not being set, but it is set for the docker container that the agent is running on.

Here's the full autoscaler log where the failure happens if that's helpful.

one year ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

possibly cheaper on the cloud (Lambda vs EC2 instance)

Whoa, are you saying there's an autoscaler that doesn't use EC2 instances? I may be misunderstanding, but that would be very cool.

Maybe I should have said: my plan is to use AWS StepFunctions where a single task in the DAG is an entire ClearML pipeline . The non-ClearML steps would orchestrate putting messages into a queue, doing retry logic, and triggering said pipeline.

I think at some point, there has to be some amount of...

2 years ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

If this works, we might be able to fully replace Metaflow with ClearML!

(Refering to the feature where Metaflow creates Step Functions state machines for you, and then you can use those to trigger event-driven batch jobs in the same way described here)

2 years ago

0 Whelp. Here'S Our Hackathon Demo Submission For A Clearml Vs Code Extension

This is a low-key open-source project if anyone wanted to contribute. Since the project is early, there are lots of high-impact things, e.g. UI polish, that would be relatively low effort 😄

one year ago

0 Hey

Oh! System tags! That would definitely have been a better way to do it. We ended up querying for tasks in the "DevOps" project with the name "Interactive Session"

one year ago

0 Hi Friends, We Got On A Sales Call With Clearml Yesterday And A Discussion About Webhooks Came Up.

I could imagine other useful automations for reacting to failed tasks that have certain tags, including alerting.

I realize we could move a lot of this logic into ClearML itself: make handler functions that run within the services queue. That would work for logic that is implemented in Python. But I believe it would be harder for our team to detect and respond to failures in the event handler functions if they were placed there because it seems unclear how we could use our existing systems a...

one year ago

0 Hey

Oh I wasn’t aware of that. I don’t think it’d work for this use case though. We’re trying to test the behavior you can see here in this extension https://share.descript.com/view/g0SLQTN6kAk so basically the examples I said in that earlier message

one year ago

0 I'M Getting Some Weird Clearml Behavior. I'Ve Deployed It To An Ec2 Instance. When I Access

None

2 years ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

OOooh, excellent. So the file server isn't necessary at all if you're using some other object storage? That's slick!

Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer? For example, if ClearML is following OAuth 2.0, then the load balancer or API Gateway could reach out to it's "issuer URL" (probably available on the EC2 instance where ClearML is running) like this example here.
![image](https://clearml-web-assets.s3.amazonaws.c...

2 years ago

0 I'M Getting Some Weird Clearml Behavior. I'Ve Deployed It To An Ec2 Instance. When I Access

So the problem came back even with this new URL. I discovered clearing your cookies fixes it.

2 years ago

0 Working On The Vs Code Extension. Pretty Stumped On This One...

Interesting . It’s actually just running locally on my laptop. It seemed only to be an issue when pointing the ClearML session CLI at my local version of ClearML. Still thinking about this one.

one year ago

0 Hi Friends, We Got On A Sales Call With Clearml Yesterday And A Discussion About Webhooks Came Up.

It seems you have a specific workflow in mind, but I'm not sure I follow it. Can you give a specific example ?

Absolutely. So, let's say a DS tags a model in ClearML with "release candidate". It'd be great to have that trigger a number of processes, each with their own retry logic:

A fairness/bias evaluation, potentially as a task in ClearML itself. This would load the model and run some sample datasets through it. The
Pipeline to prepare for deployment. Trigger a GitHub Actions ...

one year ago

0 Hi Friends, We Got On A Sales Call With Clearml Yesterday And A Discussion About Webhooks Came Up.

Thanks for the response @<1523701205467926528:profile|AgitatedDove14> !

What would you consider an event?

I was thinking of the TriggerScheduler 's definition of an event. Pretty much, any thing the TriggerSchedule allows you to react to, it'd be great to be able to publish those events to a queue external to ClearML, e.g. a tag added to a model (or removed), a state in a task changing, etc. We'd want as much metadata about that event as possible. So if the event is due to a task...

one year ago

0 Hi, I'M Eric. I'M An Mlops Engineer At A Company With 9 De'S, 6 Ds'S, And 2 Mlops Engineers. I Just Learned About Clearml A Few Hours Ago And I'M Getting Excited About It!! I'M Wondering If We Could Replace Our Current Mlops Platform With Clearml. Right N

@<1557175205510516736:profile|ShallowSwan53> at this point, I think this question deserves it's own thread. I'm curious about it too!

one year ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

I'm imagining:

The EC2 instance would be in a private subnet, accessible only on the VPN (read: VPC)
The API Gateway and Load Balancer would also be on the VPC and therefore have access to the private subnet BUT the API Gateway or Load Balancer themselves would be exposed to the public internet.
That way, to do the JWT authentication, the load balancer or API Gateway could reach out to the EC2 instance on the private network to authenticate any incoming ClearML SDK requests.

2 years ago

My understanding may be bad. Say I have a single EC2 instance. Is that instance only able to handle one task at a time?

Or can I start multiple instances of the clearml-agent process on it and then have one task per agent?

And if that's the case, can we have multiple agents on the EC2 instance listening to the same queue, e.g. default . Or would this only work if they were listening to different queues?

2 years ago

0 Hey! Starting An Mlops Director Position In 2 Weeks. I'M Thinking About Architecture. Has Anyone Ever Tried To Use Clearml As An Experiment Tracker, But Used A Different Orchestrator Like Metaflow, Airflow, Prefect, Etc.? I'M Struggling To Find Guides Or

Hey @<1523701482157772800:profile|AnxiousSeal95> ! I think ClearML's orchestrator is a great fit for ad-hoc experimentation, but not for (event-triggered) batch inference jobs that need to be relied on in production.

I'd only feel comfortable supporting pipelines that serve end users on a tool that is known for that, e.g. Metaflow, Dagster, or Airflow--mainly because those tools emphasize good monitoring and integration with the wider data ecosystem.

5 months ago

I've also used Airflow and Dagster in prod, but not integrated them with an exp tracker.

5 months ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

To do this, I think I need to know:

Can you trigger a pre-existing Pipeline via the ClearML REST API? I'd want to have a Lambda function trigger the Pipeline for a batch without needing to have all the Pipeline code in the lambda function. Something like curl -u '<clearml credetials>' None ,...
[probably a big ask] If the pipeline succeeds/fails, can ClearML emit an event that I can react to? Like mayb...

2 years ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

I took a stab at writing an automated trigger to handle this. The goal is: anytime a pipeline succeeds or fails, let AWS know so that the input records can be placed onto a retry queue (or not)

I'm trying to get a trigger to work in general, and then I'll add the more complex AWS logic. But I seem to be missing a step somewhere:

I wrote a file called set_triggers.py

from clearml.automation.trigger import TriggerScheduler

TRIGGER_SCHEDULER = TriggerScheduler()

from pprint import...

2 years ago

Show more results compactanswers