BattyCrocodile47

35 Questions, 147 Answers

Active since 02 March 2023

Last activity one month ago

Reputation

Badges 1

129 × Eureka!

Questions 35
Answers 147

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hey Friends, How Do You Configure Clearml To Use An S3 Bucket? Specifically: Does

Hey friends, how do you configure ClearML to use an S3 bucket? Specifically: does every data scientist have to have hard-coded AWS credentials with read/writ...

clearml

one year ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Whelp. Here'S Our Hackathon Demo Submission For A Clearml Vs Code Extension

Whelp. Here's our hackathon demo submission for a ClearML VS Code extension @<1523701205467926528:profile|AgitatedDove14> and @<1523701087100473344:profile|S...

clearml

one year ago

0 Votes

21 Answers

1K Views

0 Votes 21 Answers 1K Views

Crazy Idea:

Crazy idea: what if ClearML had a VS Code extension? It could help you start and join ClearML sessions! It could use your local ~/clearml.conf file for read ...

clearml

one year ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

If I Want To Run Tensorflow (Version 2.10.0 With Python 3.8) With The Aws Autoscaler, Which Ami And Docker Base Image Should I Choose?

If I want to run tensorflow (version 2.10.0 with Python 3.8) with the AWS autoscaler, which AMI and Docker base image should I choose?

tensorflow

one year ago

0 Votes

34 Answers

91K Views

0 Votes 34 Answers 91K Views

My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

My autoscaled instance fails when running "git clone" on a private repo. I do have the SSH key placed at /root/.ssh/id_rsa on the machine, and when I SSH int...

mlops

one year ago

Show more results

0 Can You Help Me Make The Case For Clearml Pipelines/Tasks Vs Metaflow? Context Within...

Oh this is thought provoking. Yeah, the idea of using ClearML for R&D is super appealing (to me speaking as an MLOps engineer 😆 ). And having the power of Metaflow's scheduler (on Step Functions with Event Bridge since we'd do the AWS-native deployment) also makes sense to me.

I'll keep asking questions about how we could do event-based jobs with alerting built in on ClearML in a different thread later on.

I pasted your points (anonymously) onto the Metaflow slack to le...

one year ago

0 Does Clearml Have A Good Story For Offline/Batch Inference In Production? I Worked In The Airflow World For 2 Years And These Are The General Features We Used To Accomplish This. Are These Possible With Clearml?

This is totally what I was looking for! Yeah, by "good story for offline batch" I meant, "good feature support for ..."

I bookmarked this comment. I think I'll be doing a POC trying to show this functionality within the next month.

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Actually that's wrong: really this is the current volume mount

'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',

Could changing these values to /root/.ssh work? Do you know what use within the docker image ClearML is using?

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Let's see. The screenshots above are me running on the host, not attaching to a running container. So I believe I do want the keys to be mounted into the running containers.

one year ago

0 Is the <https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac|docker-compose deployment> suitable for production? If not, is it be better to somehow use docker swarm instead? The advantage of docker swarm is that it can heal contain

Oh hooray! So docker-compose manages the restarting of crashed containers? I didn't know that, and that is great 😄

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

Oh, that is cool. I captured all this. Maybe I'll make a user-data.sh script and docker-compose.yml file that brings all these things together. Probably won't have time for a few weeks.

one year ago

0 Hi, I'M Eric. I'M An Mlops Engineer At A Company With 9 De'S, 6 Ds'S, And 2 Mlops Engineers. I Just Learned About Clearml A Few Hours Ago And I'M Getting Excited About It!! I'M Wondering If We Could Replace Our Current Mlops Platform With Clearml. Right N

That is great! This is all the motivation I needed to decide to do a POC at some point.

one year ago

0 Can Anyone Recommend A Good Workflow For

Oh my goodness. Thank you! I'd seen that before, but for some reason it didn't register I could run that with VS Code...

But this config should almost never need to change!

Host clearml-session
    HostName localhost
    User root
    Port 8022

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Actually, dumb question: how do I set the setup script for a task?

one year ago

0 I’M

That could work! Is that an option? Something that lets me spin up the ClearML and get a services worker to connect to it without manual steps.

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

I'm not seeing a extra_docker_shell_script in my clearml.conf generated by clearml-agent init like in this guide

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

I don't see it as an argument in Task.init or Task.execute_remotely

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

So I get output with this one, but the console only shows me the output from my machine. For example, the SSH key is present, and whoami results in ericriddoch

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

I have the same behavior whether or not I put task.execute_remotely(...) before or after the call to run_shell_script()

one year ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

I'll try to describe the scenario I was thinking would cause ClearML to break down:

Assume:

We've got a queue called streaming
We've got an S3 bucket with images landing inside
When the images land, they go into a queue
When there are 100 images in the queue, we trigger a ClearML pipeline to ingest, transform, run inference on the batch, and then write the results somewhere
Let's say we get 1,000,000 images in the Bucket per hour. That might be 1,000,000 / 100 = 10,000 batches. ...

one year ago

But from your other answer, I think I'm understanding that you can have multiple agents on a single instance listening to the same queue.

So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.

one year ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

Is there a way we can protect a ClearML deployment with a load balancer or API Gateway that is exposed to the whole world, but is protected by authentication so that only authorized clients can get in?

one year ago

As an infrastructure engineer, I feel that this is a fairly significant shortcoming of ClearML.

Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance) would

simplify the experience for data scientists
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake). As it is, you can make this work, but if you start to get ...

one year ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

I'm imagining:

The EC2 instance would be in a private subnet, accessible only on the VPN (read: VPC)
The API Gateway and Load Balancer would also be on the VPC and therefore have access to the private subnet BUT the API Gateway or Load Balancer themselves would be exposed to the public internet.
That way, to do the JWT authentication, the load balancer or API Gateway could reach out to the EC2 instance on the private network to authenticate any incoming ClearML SDK requests.

one year ago

0 Aws Autoscale Question: Can The Autoscaler Use The Iam Role Of The Ec2 Instance

Ah, okay thanks!

one year ago

0 Another Aws Autoscaler Question. The

Sorry, clarifying:

The agent-services entry in the docker-compose file seems to add a single worker to the services queue

one year ago

0 Can You Help Me Make The Case For Clearml Pipelines/Tasks Vs Metaflow? Context Within...

Thanks for replying Martin! (as always)

Do you think ClearML is a strong option for running event-based training and batch inference jobs in production? That’d include monitoring and alerting. I’m afraid that Metaflow will look far more compelling to our teams for that reason.

Since it deploys onto step functions, the scheduling is managed for you and I believe alerts for failing jobs can be set up without adding custom code to every pipeline.

If that’s the case, then we’d probably only...

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

So, we've been able to run sudo su and then git clone with our private repos a few times now

one year ago

0 My Team Uses Metaflow By Outerbounds. Great Dag Tool. Super Robust. We Run Our Production Workloads On It And Use It For Experimentation, Too. I'M Considering Adding Clearml To Our Stack As An Exp Tracker / Model Registry Rather Than Going With The More

Thanks for this!! I may try it and if I do and it works I’ll look into writing a plugin for ZenML and Metaflow that auto initializes the parent task and registers the steps as child tasks. Super helpful thank you!

one month ago

0 Hey! Starting An Mlops Director Position In 2 Weeks. I'M Thinking About Architecture. Has Anyone Ever Tried To Use Clearml As An Experiment Tracker, But Used A Different Orchestrator Like Metaflow, Airflow, Prefect, Etc.? I'M Struggling To Find Guides Or

Dang! @<1590514584836378624:profile|AmiableSeaturtle81> awesome answer thank you! You seem like an awesome person to know. Definitely connect if you'd like to talk ops stuff sometime. None

4 months ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

That's with the key at /root/.ssh/id_rsa

one year ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

The dark theme you have

It's this chrome extension ! I forget it's even on sometimes. It gives you a keyboard shortcut to toggle dark mode on any website. I love it.

Success! Wow, so this means I can use ClearML training/inference pipelines as part of AWS StepFunctions!

My plan is to have a AWS Step Functions state machine (DAG) that treats running a ClearML job as one step (task) in t...

one year ago

I see. Is it possible for two agents to be utilizing the same GPU? (like if the machine has a terrific GPU, but only one of them?)

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Let's see. The task log? I think this is it.

one year ago

0 I Am Struggling A Bit To Understand The Use Case Of A Pipeline: Let Say You Have Step1 -> Step2 -> Step3 What Is The Point To Use Pipeline Feature Versus Having A Single Task That Do Those Steps One After Another ???

Oh there's parallelization as well. You could have step 1 gather the data, and then fan out to N parallel steps that all do different things with the data, for example hyper parameter tuning

one year ago

Show more results