BattyCrocodile47

35 Questions, 147 Answers

Active since 02 March 2023

Last activity one month ago

Reputation

Badges 1

129 × Eureka!

Answers 147

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

So here's a snippet from my aws_autoscaler.yaml file

one year ago

0 Hey

I ultimately resorted to creating a selenium script combined with docker-compose. Not a beautiful solution but I can confirm that it works 😕 None

one year ago

0 Can You Help Me Make The Case For Clearml Pipelines/Tasks Vs Metaflow? Context Within...

For these functions, Metaflow offers:

triggering: integration with AWS event bridge. It's really easy to use Boto3 and AWS access keys to emit events for Metaflow DAGs. It's nice not to have to worry about networking for this.
Scheduling: The fact that Metaflow uses stepfunctions is reassuring.
observability: this lovely flame graph where you can view the logs and duration of each step in the DAG, it's easy to view all the DAG runs including the ones that have failed. Ideally, we w...

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

cc: @<1565509803839590400:profile|MoodyBear54>

one year ago

Yes, it's pretty lame that a clearml-agent can only process one task at a time if it's not listening to a services queue 🤔

one year ago

0 More Of Pushing Clearml To It'S Data Engineering Limits

I took a stab at writing an automated trigger to handle this. The goal is: anytime a pipeline succeeds or fails, let AWS know so that the input records can be placed onto a retry queue (or not)

I'm trying to get a trigger to work in general, and then I'll add the more complex AWS logic. But I seem to be missing a step somewhere:

I wrote a file called set_triggers.py

from clearml.automation.trigger import TriggerScheduler

TRIGGER_SCHEDULER = TriggerScheduler()

from pprint import...

one year ago

0 Crazy Idea:

I took a look

I think the Outerbounds extension (the one in my screenshot) is currently closed source. That makes sense to me. A bit sad because it is highly similar.
Another example could be the AWS ToolKit extension. But sadly, it's hardly a "minimal example". I was thinking it's relevant because it uses your local ~/.aws/ folder, which is similar to what we'd want to do.

one year ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

Thank you! I think it does. It’s just now dawning on me that: because a pipeline is composed of multiple tasks, different tasks in the pipeline could run on different machines. Or more specifically, they could run on different queues, and as you said, in your other response, we could have a Q for smaller CPU-based instances, and another queue larger GPU-based instances.

I like the idea of having a queue dedicated to CPU-based instances that has multiple agents running on it simultaneously....

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Trying as a python subprocess...

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Or the log of the init script?

one year ago

0 Whelp. Here'S Our Hackathon Demo Submission For A Clearml Vs Code Extension

How it works / what we finished:

We used the SaaS ClearML, started an EC2 instance, and manually installed and ran the clearml-agent daemon on it
We ran clearml-init on our laptops to generate the clearml.conf file.
The extension is in TypeScript, so...
We started trying to write code with the Python SDK to list sessions, but realized calling that from the extension would be hard, so we opted to have the TypeScript code make calls to the ClearML API server directly, e.g. ...

one year ago

0 Crazy Idea:

In a future iteration, it'd be cool if you could configure presets. Like maybe you have an on-startup.sh script you really like using to set up your instance, and VS Code extensions you want to pass to the --install-extensions ... flag

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Haha, that was a total gotcha for me. Yeah, a lot just wasn't even getting run due to the #!/bin/bash part.

Anyway, wow! I finally got the precious console logs you thought to find, here they are:

2023-05-06 00:19:21
User aborted: stopping task (3)
2023-05-06 00:19:21
Successfully installed PyYAML-6.0 attrs-22.2.0 certifi-2022.12.7 charset-normalizer-3.1.0 clearml-agent-1.5.2 distlib-0.3.6 filelock-3.12.0 furl-2.1.3 idna-3.4 jsonschema-4.17.3 orderedmultidict-1.0.1 pathlib2-2.3.7....

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

The key seems to be placed in the expected location

one year ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

configurations:
  extra_clearml_conf: ""
  extra_trains_conf: ""
  extra_vm_bash_script: |
    aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_private_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa
    source /clearml_agent_venv/bin/activate

hyper_params:
  iam_arn: arn:aws:iam::<my account id>:instance-profile/clearml-2-AutoscaledInstanceProfileAutoScaledEC2InstanceProfile56A5348F-90fmf6H5OUBx

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

As opposed to using CRON or something 🤣

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

You know, you could probably add some immortal containers to the docker-compose.yml that use images with mongodump and the ES equivalent installed.

The container(s) could have a bash script with a while loop in it that sleeps for 30 minutes and then does a backup. If you installed the AWS CLI inside, it could even take care of uploading to S3.

I like this idea, because docker-compose.yml could make sure that if the backup container ever dies, it would be restarted.

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

Ah, but it's probably worth noting that the docker-compose.yml does register the EC2 isntance that the server is running on as an agent listening on the services queue, so ongoing tasks in that queue that happen to be placed on the server would get terminated when docker-compose down is run.

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

@<1523701070390366208:profile|CostlyOstrich36> Oh that’s smart. Is that to make sure no transactions happen during the backup? Would there be a risk of ongoing or pending tasks somehow getting corrupted if you shut the server down?

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

Wow, that is seriously impressive.

one year ago

0 Is There Any Documentation From Clearml On Best Practices For Mounting/Using External Ebs Volumes For The Clearml Server? We Would Like To Mount An External Ebs Volume To The

Hey! Sorry, I don't think I ever solved this for elasticsearch 😕

one year ago

0 How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

Earlier in the thread they mentioned that the agents are all resilient. So no ongoing tasks should be lost. I imagine even in a large organization, you could afford 5-10 minutes of downtime at 2AM or something.

That said, you'd only have 1 backup per day which could be a big deal depending on the experiments your running. You might want more than that.

one year ago

0 Hey! Starting An Mlops Director Position In 2 Weeks. I'M Thinking About Architecture. Has Anyone Ever Tried To Use Clearml As An Experiment Tracker, But Used A Different Orchestrator Like Metaflow, Airflow, Prefect, Etc.? I'M Struggling To Find Guides Or

Hey @<1523701482157772800:profile|AnxiousSeal95> ! I think ClearML's orchestrator is a great fit for ad-hoc experimentation, but not for (event-triggered) batch inference jobs that need to be relied on in production.

I'd only feel comfortable supporting pipelines that serve end users on a tool that is known for that, e.g. Metaflow, Dagster, or Airflow--mainly because those tools emphasize good monitoring and integration with the wider data ecosystem.

4 months ago

0 My Team Uses Metaflow By Outerbounds. Great Dag Tool. Super Robust. We Run Our Production Workloads On It And Use It For Experimentation, Too. I'M Considering Adding Clearml To Our Stack As An Exp Tracker / Model Registry Rather Than Going With The More

you mean as experiment management / model registry / data? I think this is the bread&butter of clearml

💯 . I was wondering if anyone had had experience using ClearML together with one of these others.

I think most of them are alternatives to metaflow

Totally.

Like, if you google "dagster and clearml" or "prefect and clearml" or "airflow and clearml" -- I don't find any blogs written by people talking about how they use both of them together.

That's strange to me, becau...

one month ago

I've also used Airflow and Dagster in prod, but not integrated them with an exp tracker.

4 months ago

0 If I Want To Run Tensorflow (Version 2.10.0 With Python 3.8) With The Aws Autoscaler, Which Ami And Docker Base Image Should I Choose?

Oh, right... the Docker image running on the instance takes care of the library versions. You guys are great!

one year ago

0 Crazy Idea:

Playing around this weekend to learn the ins and outs of extensions

one year ago

0 Crazy Idea:

I did a post on Linkedin with several slides on how I plan to build it here

one year ago

0 After Presenting Clearml To My Team, I Got The Question "We'Re Already On Aws, Why Not Use Sagemaker?" Tbh, I'Ve Never Gone Through The Ml Workflow With Sagemaker. The Only Advantage I Could Think Of Is That We Can Use Our On-Prem Machines For Training,

@<1523701205467926528:profile|AgitatedDove14> you beautiful person, this is terrific! I do believe SageMaker has some nice monitoring/data drift capabilities that seem interesting, but these points you have here will be a fantastic starting point for my team's analysis of the products. I think this will help balance some of the over-enthusiasm towards using the native AWS solution.

one year ago

0 Crazy Idea:

I'll search around some more when I get time. I have no idea, but it feels like ClearML has already done the hard part which is creating clearml-session in the first place.

This could be a really low-hanging OSS contribution that could make a real impact 😄 .

one year ago

Show more results