BattyCrocodile47

36 Questions, 147 Answers

Active since 02 March 2023

Last activity 8 months ago

Reputation

Badges 1

129 × Eureka!

Answers 147

0 Is the <https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac|docker-compose deployment> suitable for production? If not, is it be better to somehow use docker swarm instead? The advantage of docker swarm is that it can heal contain

Oh hooray! So docker-compose manages the restarting of crashed containers? I didn't know that, and that is great 😄

2 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

I'm not seeing a extra_docker_shell_script in my clearml.conf generated by clearml-agent init like in this guide

2 years ago

0 Working On The Vs Code Extension. Pretty Stumped On This One...

Interesting . It’s actually just running locally on my laptop. It seemed only to be an issue when pointing the ClearML session CLI at my local version of ClearML. Still thinking about this one.

one year ago

0 Crazy Idea:

Yeah, I believe all VS Code Extensions are in TypeScript. My main point was that this is an example of a VS Code extension that executes a Python CLI.

one year ago

0 Hey! Starting An Mlops Director Position In 2 Weeks. I'M Thinking About Architecture. Has Anyone Ever Tried To Use Clearml As An Experiment Tracker, But Used A Different Orchestrator Like Metaflow, Airflow, Prefect, Etc.? I'M Struggling To Find Guides Or

I've also used Airflow and Dagster in prod, but not integrated them with an exp tracker.

11 months ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

I'll try to describe the scenario I was thinking would cause ClearML to break down:

Assume:

We've got a queue called streaming
We've got an S3 bucket with images landing inside
When the images land, they go into a queue
When there are 100 images in the queue, we trigger a ClearML pipeline to ingest, transform, run inference on the batch, and then write the results somewhere
Let's say we get 1,000,000 images in the Bucket per hour. That might be 1,000,000 / 100 = 10,000 batches. ...

2 years ago

But from your other answer, I think I'm understanding that you can have multiple agents on a single instance listening to the same queue.

So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.

2 years ago

As an infrastructure engineer, I feel that this is a fairly significant shortcoming of ClearML.

Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance) would

simplify the experience for data scientists
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake). As it is, you can make this work, but if you start to get ...

2 years ago

0 Hi Friends, We Got On A Sales Call With Clearml Yesterday And A Discussion About Webhooks Came Up.

I could imagine other useful automations for reacting to failed tasks that have certain tags, including alerting.

I realize we could move a lot of this logic into ClearML itself: make handler functions that run within the services queue. That would work for logic that is implemented in Python. But I believe it would be harder for our team to detect and respond to failures in the event handler functions if they were placed there because it seems unclear how we could use our existing systems a...

2 years ago

0 Hey

For now, I've written a headless selenium script to generate credentials for the fresh ClearML instance in CI.

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

We should put a $100 bounty on a bash script that backs up and restores mongodb, redis, and ES, etc. to S3 using the most resiliant ways 😄

2 years ago

0 Crazy Idea:

In a future iteration, it'd be cool if you could configure presets. Like maybe you have an on-startup.sh script you really like using to set up your instance, and VS Code extensions you want to pass to the --install-extensions ... flag

one year ago

0 Sorry For Always Posting Such Cryptic Problems. I Managed To Create A Docker-Compose File That Runs Clearml

The agent commands are nothing special.

clearml-agent daemon --queue sessions --cpu-only --create-queue true --docker

one year ago

0 Whelp. Here'S Our Hackathon Demo Submission For A Clearml Vs Code Extension

Thank you! For now, it's kind of nice that it just picks up your credentials from your conf file. No extra setup required beyond the onboarding ClearML has you do 😄

And look! It's working, assuming you start the clearml session up yourself:

one year ago

0 Hi,

Disclaimer: I'm not familiar enouch with the ClearML codebase to vouch for the quality of this PR, although it is short which is typically good . The feature we're interested in is the ability to specify the subnet_id .

2 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

The key seems to be placed in the expected location

2 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Let's see. The screenshots above are me running on the host, not attaching to a running container. So I believe I do want the keys to be mounted into the running containers.

2 years ago

0 I’M

The question I'm exploring remains: is it possible to acquire that initial set of ClearML API keys programmatically so that the manual steps of 1-4 above can be avoided for an initial deployment?

2 years ago

0 Hi, I Think I Found A Problem With A Clean Clearml Install. I Create A New Python Env:

I literally just ran into this minutes ago and was about to file a bug report. A colleague ran into the same problem. It looks like urllib3 upgraded to v2 last week.

2 years ago

0 My Team Uses Metaflow By Outerbounds. Great Dag Tool. Super Robust. We Run Our Production Workloads On It And Use It For Experimentation, Too. I'M Considering Adding Clearml To Our Stack As An Exp Tracker / Model Registry Rather Than Going With The More

Thanks for this!! I may try it and if I do and it works I’ll look into writing a plugin for ZenML and Metaflow that auto initializes the parent task and registers the steps as child tasks. Super helpful thank you!

8 months ago

Hey @<1523701482157772800:profile|AnxiousSeal95> ! I think ClearML's orchestrator is a great fit for ad-hoc experimentation, but not for (event-triggered) batch inference jobs that need to be relied on in production.

I'd only feel comfortable supporting pipelines that serve end users on a tool that is known for that, e.g. Metaflow, Dagster, or Airflow--mainly because those tools emphasize good monitoring and integration with the wider data ecosystem.

11 months ago

0 Sorry For Always Posting Such Cryptic Problems. I Managed To Create A Docker-Compose File That Runs Clearml

Hmm... these people are recommending restarting docker completely. I may have tried that already, but I'll do it again when I get some time to be sure.

one year ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

OOooh, excellent. So the file server isn't necessary at all if you're using some other object storage? That's slick!

Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer? For example, if ClearML is following OAuth 2.0, then the load balancer or API Gateway could reach out to it's "issuer URL" (probably available on the EC2 instance where ClearML is running) like this example here.
![image](https://clearml-web-assets.s3.amazonaws.c...

2 years ago

0 Can Anyone Recommend A Good Workflow For

Oh my goodness. Thank you! I'd seen that before, but for some reason it didn't register I could run that with VS Code...

But this config should almost never need to change!

Host clearml-session
    HostName localhost
    User root
    Port 8022

2 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

It doesn't seem to want to show me stdout

2 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

On it

2 years ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

Man, I owe you lunch sometime @<1523701205467926528:profile|AgitatedDove14> . Thanks for being so detailed in your answers.

Okay! So the pipeline ID is really just a task ID. So cool!

Not sure I fully understand what you mean here...

Sorry, I'll try again. Here's an illustrated example with AWS Step Functions (pretend this is a ClearML pipeline). If the pipeline fails, I'd want to have a chance to do some logic to react to that. Maybe in a step called "on_pipeline_failed" or someth...

2 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Or the log of the init script?

2 years ago

0 After Presenting Clearml To My Team, I Got The Question "We'Re Already On Aws, Why Not Use Sagemaker?" Tbh, I'Ve Never Gone Through The Ml Workflow With Sagemaker. The Only Advantage I Could Think Of Is That We Can Use Our On-Prem Machines For Training,

@<1523701205467926528:profile|AgitatedDove14> you beautiful person, this is terrific! I do believe SageMaker has some nice monitoring/data drift capabilities that seem interesting, but these points you have here will be a fantastic starting point for my team's analysis of the products. I think this will help balance some of the over-enthusiasm towards using the native AWS solution.

2 years ago

I see. Is it possible for two agents to be utilizing the same GPU? (like if the machine has a terrific GPU, but only one of them?)

2 years ago

Show more results