BattyCrocodile47

Moderator

35 Questions, 147 Answers

Active since 02 March 2023

Last activity 2 months ago

Reputation

Badges 1

129 × Eureka!

Questions 35
Answers 147

0 Votes

7 Answers

983 Views

0 Votes 7 Answers 983 Views

Working On The Vs Code Extension. Pretty Stumped On This One...

Working on the VS Code extension. Pretty stumped on this one...

clearml

one year ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

I Gave A Demo Of Clearml To Our Data

I gave a demo of ClearML to our data engineering team (is also the ML infra team) and it went over really well! 🎉 We have two data science teams that we are...

clearml

2 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Is There A Command Line Interface That Lets You Query And Download Models From The Clearml Model Registry The Way You Can With Mlflow? Example:

Is there a command line interface that lets you query and download models from the ClearML model registry the way you can with MLFlow? Example: # search for ...

clearml

one year ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

More Of Pushing Clearml To It'S Data Engineering Limits

More of pushing ClearML to it's data engineering limits 😅 . Could you use ClearML in a event-driven system? That would be so sick! I'm wondering if we could...

clearml

2 years ago

0 Votes

34 Answers

117K Views

0 Votes 34 Answers 117K Views

My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

My autoscaled instance fails when running "git clone" on a private repo. I do have the SSH key placed at /root/.ssh/id_rsa on the machine, and when I SSH int...

mlops

one year ago

Show more results questions

0 Hey Guys, Is There A Way To Dynamically Know The Path Of A Cloned Github Project (And Task) Inside A Docker Mode Worker? I Want To Set The Pythonpath Inside My Docker Worker, So I Can Access Local Modules, And My Only Problem Is That The Path Depends On T

I don't know about this, but could you turn your whole project into a pip-installable package using a setup.py and/or pyproject.toml ?

I've never tried this, but maybe then you could do pip install -e . locally before executing the task. Then execute. And then maybe the pip freeze that ClearML does would contain the symlink to your directory.

(so that from my_package import ... statements would work)

2 years ago

0 Hi Friends, We Got On A Sales Call With Clearml Yesterday And A Discussion About Webhooks Came Up.

I could imagine other useful automations for reacting to failed tasks that have certain tags, including alerting.

I realize we could move a lot of this logic into ClearML itself: make handler functions that run within the services queue. That would work for logic that is implemented in Python. But I believe it would be harder for our team to detect and respond to failures in the event handler functions if they were placed there because it seems unclear how we could use our existing systems a...

one year ago

0 Hi Friends, We Got On A Sales Call With Clearml Yesterday And A Discussion About Webhooks Came Up.

Thanks for the response @<1523701205467926528:profile|AgitatedDove14> !

What would you consider an event?

I was thinking of the TriggerScheduler 's definition of an event. Pretty much, any thing the TriggerSchedule allows you to react to, it'd be great to be able to publish those events to a queue external to ClearML, e.g. a tag added to a model (or removed), a state in a task changing, etc. We'd want as much metadata about that event as possible. So if the event is due to a task...

one year ago

0 Hey

Exactly

one year ago

0 Hey

For now, I've written a headless selenium script to generate credentials for the fresh ClearML instance in CI.

one year ago

0 Hey

But the extension will need credentials to connect to it.

one year ago

0 Hey

I could potentially write a selenium script to make a set of keys, but I'd prefer to avoid that 😅

one year ago

0 Hey

I ultimately resorted to creating a selenium script combined with docker-compose. Not a beautiful solution but I can confirm that it works 😕 None

one year ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

If this works, we might be able to fully replace Metaflow with ClearML!

(Refering to the feature where Metaflow creates Step Functions state machines for you, and then you can use those to trigger event-driven batch jobs in the same way described here)

2 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

On it

one year ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

Man, I owe you lunch sometime @<1523701205467926528:profile|AgitatedDove14> . Thanks for being so detailed in your answers.

Okay! So the pipeline ID is really just a task ID. So cool!

Not sure I fully understand what you mean here...

Sorry, I'll try again. Here's an illustrated example with AWS Step Functions (pretend this is a ClearML pipeline). If the pipeline fails, I'd want to have a chance to do some logic to react to that. Maybe in a step called "on_pipeline_failed" or someth...

2 years ago

0 Sorry For Always Posting Such Cryptic Problems. I Managed To Create A Docker-Compose File That Runs Clearml

The agent commands are nothing special.

clearml-agent daemon --queue sessions --cpu-only --create-queue true --docker

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Haha, that was a total gotcha for me. Yeah, a lot just wasn't even getting run due to the #!/bin/bash part.

Anyway, wow! I finally got the precious console logs you thought to find, here they are:

2023-05-06 00:19:21
User aborted: stopping task (3)
2023-05-06 00:19:21
Successfully installed PyYAML-6.0 attrs-22.2.0 certifi-2022.12.7 charset-normalizer-3.1.0 clearml-agent-1.5.2 distlib-0.3.6 filelock-3.12.0 furl-2.1.3 idna-3.4 jsonschema-4.17.3 orderedmultidict-1.0.1 pathlib2-2.3.7....

one year ago

0 I’M

One idea: is it possible to store usable credentials in advance and place them in a volume that the ClearML containers can access and then use?

one year ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

When you run the docker-compose.yml on an EC2 instance, you can configure user login for the ClearML webserver. But the files API is still open to the world, right? (and same with the backend?)

We could solve this by placing the EC2 instance into a VPN.

One disadvantage to that approach is it becomes annoying to reach the model registry from outside the VPN, like if you have a deployment pipeline based in GitHub Actions. Or if you wanted to trigger a ClearML pipeline from a VPC that isn...

2 years ago

Yes, it's pretty lame that a clearml-agent can only process one task at a time if it's not listening to a services queue 🤔

2 years ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

@<1523701070390366208:profile|CostlyOstrich36> Oh that’s smart. Is that to make sure no transactions happen during the backup? Would there be a risk of ongoing or pending tasks somehow getting corrupted if you shut the server down?

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

Oh, that is cool. I captured all this. Maybe I'll make a user-data.sh script and docker-compose.yml file that brings all these things together. Probably won't have time for a few weeks.

one year ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

To do this, I think I need to know:

Can you trigger a pre-existing Pipeline via the ClearML REST API? I'd want to have a Lambda function trigger the Pipeline for a batch without needing to have all the Pipeline code in the lambda function. Something like curl -u '<clearml credetials>' None ,...
[probably a big ask] If the pipeline succeeds/fails, can ClearML emit an event that I can react to? Like mayb...

2 years ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

Earlier in the thread they mentioned that the agents are all resilient. So no ongoing tasks should be lost. I imagine even in a large organization, you could afford 5-10 minutes of downtime at 2AM or something.

That said, you'd only have 1 backup per day which could be a big deal depending on the experiments your running. You might want more than that.

one year ago

0 My Team Uses Metaflow By Outerbounds. Great Dag Tool. Super Robust. We Run Our Production Workloads On It And Use It For Experimentation, Too. I'M Considering Adding Clearml To Our Stack As An Exp Tracker / Model Registry Rather Than Going With The More

Thanks for this!! I may try it and if I do and it works I’ll look into writing a plugin for ZenML and Metaflow that auto initializes the parent task and registers the steps as child tasks. Super helpful thank you!

2 months ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

You know, you could probably add some immortal containers to the docker-compose.yml that use images with mongodump and the ES equivalent installed.

The container(s) could have a bash script with a while loop in it that sleeps for 30 minutes and then does a backup. If you installed the AWS CLI inside, it could even take care of uploading to S3.

I like this idea, because docker-compose.yml could make sure that if the backup container ever dies, it would be restarted.

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

So I get output with this one, but the console only shows me the output from my machine. For example, the SSH key is present, and whoami results in ericriddoch

one year ago

0 Hello, Is There Any Hope To Use Clearml-Serving Without The Clearml Server? The Tutorial And Docs Make It Seem Like It'S Required But I Wanted To Check To Be Sure. I Really Like All The Features That Clearml Provides But It Seems Like Everything Is Deep

I’d really prefer it was modular enough to use serving with any model registry

Oh that's interesting. To serve a model from MLflow, would you have to copy it over to ClearML first?

one year ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

As an infrastructure engineer, I feel that this is a fairly significant shortcoming of ClearML.

Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance) would

simplify the experience for data scientists
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake). As it is, you can make this work, but if you start to get ...

2 years ago

But from your other answer, I think I'm understanding that you can have multiple agents on a single instance listening to the same queue.

So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.

2 years ago

That's fabulous. This is definitely how my team prefers to structure projects. I hadn't gotten around to trying that out in our POC of ClearML yet, but I'm certain this is how our group will solve this problem

2 years ago

0 Is the <https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac|docker-compose deployment> suitable for production? If not, is it be better to somehow use docker swarm instead? The advantage of docker swarm is that it can heal contain

Oh hooray! So docker-compose manages the restarting of crashed containers? I didn't know that, and that is great 😄

2 years ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

possibly cheaper on the cloud (Lambda vs EC2 instance)

Whoa, are you saying there's an autoscaler that doesn't use EC2 instances? I may be misunderstanding, but that would be very cool.

Maybe I should have said: my plan is to use AWS StepFunctions where a single task in the DAG is an entire ClearML pipeline . The non-ClearML steps would orchestrate putting messages into a queue, doing retry logic, and triggering said pipeline.

I think at some point, there has to be some amount of...

2 years ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

I'm imagining:

The EC2 instance would be in a private subnet, accessible only on the VPN (read: VPC)
The API Gateway and Load Balancer would also be on the VPC and therefore have access to the private subnet BUT the API Gateway or Load Balancer themselves would be exposed to the public internet.
That way, to do the JWT authentication, the load balancer or API Gateway could reach out to the EC2 instance on the private network to authenticate any incoming ClearML SDK requests.

2 years ago

Show more results compactanswers