WobblyFrog79

9 Questions, 32 Answers

Active since 08 February 2025

Last activity 11 days ago

Reputation

Badges 1

22 × Eureka!

Questions 9
Answers 32

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

How To Configure Clearml Agent To Keep Pods Around After They Finish/Fail? I Want To Debug A Pod That Crashes, But It Gets Deleted Quickly

How to configure ClearML agent to keep pods around after they finish/fail? I want to debug a pod that crashes, but it gets deleted quickly

mlops

7 months ago

0 Votes

4 Answers

168 Views

0 Votes 4 Answers 168 Views

Hello, I Submitted A Pr In July Which Still Hasn'T Been Merged, And I'M Having Trouble Reaching The Person Who Reviewed It Initially. Can Somebody From The Clearml Team Review The Pr, So We Can Finally Move Forward? Thanks. Link ->

Hello, I submitted a PR in July which still hasn't been merged, and I'm having trouble reaching the person who reviewed it initially. Can somebody from the C...

clearml

24 days ago

0 Votes

6 Answers

944 Views

0 Votes 6 Answers 944 Views

A Question Regarding Using

A question regarding using clearml-agent with k8s clusters. We use ClearML pipelines to train our models. The pods sometimes fail due to intermittent failure...

clearml

3 months ago

0 Votes

3 Answers

700 Views

0 Votes 3 Answers 700 Views

How Can I Access Task Ids Of Tasks Running Within A

How can I access task IDs of tasks running within a PipelineDecorator.pipeline ? I know PipelineController has get_running_nodes method, but how to achieve t...

clearml

8 months ago

0 Votes

1 Answers

630 Views

0 Votes 1 Answers 630 Views

Hello, I'M Having Issues With Cloning A Private Repository That Uses Submodules With Private Repositories. I'M Using

Hello, I'm having issues with cloning a private repository that uses submodules with private repositories. I'm using CLEARML_AGENT_GIT_PASS and CLEARML_AGENT...

clearml

7 months ago

0 Votes

9 Answers

826 Views

0 Votes 9 Answers 826 Views

Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

Hello, a question about pipelines. I have a repository with one pipeline using decorators, defined in pipeline.py . It uses multiple components that import c...

clearml

8 months ago

0 Votes

0 Answers

687 Views

0 Votes 0 Answers 687 Views

Is There A Way To Change The Name Of Mongodb Databases Used By Clearml? We Want To Have Two Self-Hosted Instances Of Clearml That Are Going To Use The Same Mongodb, But Since They Both Use The

Is there a way to change the name of MongoDB databases used by ClearML? We want to have two self-hosted instances of ClearML that are going to use the same M...

clearml

6 months ago

0 Votes

4 Answers

707 Views

0 Votes 4 Answers 707 Views

What Is The Recommended Way Of Passing Environment Variables To Kubernetes Pods Executed Using Clearml Pipelines? I Know I Could Create A “Global” Kubernetes Secret, But My Users Want To Set Their Own Environment Variables Pretty Often. I Tried Using The

What is the recommended way of passing environment variables to Kubernetes pods executed using ClearML pipelines? I know I could create a “global” Kubernetes...

clearml

8 months ago

0 Votes

2 Answers

325 Views

0 Votes 2 Answers 325 Views

Is There A Way To Fail Early If A Task In A K8S Pipeline References A Queue That Doesn'T Actually Exist? We'Ve Had This Happen By Accident (Typo), The Pipeline Just Kept Running Indefinitely.

Is there a way to fail early if a task in a k8s pipeline references a queue that doesn't actually exist? We've had this happen by accident (typo), the pipeli...

clearml

one month ago

0 A Question Regarding Using

I'm not talking about node failure, rather pod failure, which is out-of-memory in 99% of the cases.

3 months ago

0 How Can I Access Task Ids Of Tasks Running Within A

Thanks @<1806497735218565120:profile|BrightJellyfish46>

8 months ago

0 Hi, I'M Trying To Use Clearml Pipelines, But Some Of My Components Need To Have Some Env Variables Defined. I Can'T Seem To Find Anything In The Documentation Regarding How To Do This?

None

8 months ago

0 A Question Regarding Using

@<1523701070390366208:profile|CostlyOstrich36> they don't as the pod is killed as soon as the process inside oversteps the memory limit

3 months ago

0 Hello! I Am Setting Up A Clearml-Server With Self-Hosted Minio. Do I Would Like To Keep The Clearml.Conf As Default As Possible (Such That Users Do Not Need To Configure Much And Do Not Need Access To Mino Keys). I Am Trying To Use The Server-Config File

Here’s how I do it using clearml.conf config for my agent:

sdk {
  aws {
    s3 {
      ...
    }
  }
  development {
    default_output_uri: "

"
  }
}

8 months ago

0 Is There A Way To Fail Early If A Task In A K8S Pipeline References A Queue That Doesn'T Actually Exist? We'Ve Had This Happen By Accident (Typo), The Pipeline Just Kept Running Indefinitely.

@<1523701087100473344:profile|SuccessfulKoala55> my colleague submitted a pipeline whose component was referencing a non-existent queue. The queue doesn't actually exist, that's the issue. The "default" queue that handles the controller task just started to output error messages saying that this component can't be scheduled due to missing queue. We just want a way to fail early if a queue doesn't exist, instead of a pipeline running indefinitely without actually failing.

one month ago

0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

no worries @<1523701205467926528:profile|AgitatedDove14>

8 months ago

0 What Is The Recommended Way Of Passing Environment Variables To Kubernetes Pods Executed Using Clearml Pipelines? I Know I Could Create A “Global” Kubernetes Secret, But My Users Want To Set Their Own Environment Variables Pretty Often. I Tried Using The

Yes

8 months ago

0 Hello, I Submitted A Pr In July Which Still Hasn'T Been Merged, And I'M Having Trouble Reaching The Person Who Reviewed It Initially. Can Somebody From The Clearml Team Review The Pr, So We Can Finally Move Forward? Thanks. Link ->

Hello @<1523703097560403968:profile|CumbersomeCormorant74> , I found your name on the company website, you're the VP of Engineering if I'm not mistaken? I wanted to directly ask you, since I'm having trouble reaching engineers on GitHub. What is your policy & process for OSS contributions? My team is a heavy user, and we occasionally find things to improve, but the experience for contributions hasn't been great so far. Thanks for making ClearML open-source!

22 days ago

0 How To Configure Clearml Agent To Keep Pods Around After They Finish/Fail? I Want To Debug A Pod That Crashes, But It Gets Deleted Quickly

Hey @<1523701070390366208:profile|CostlyOstrich36> , could you provide any suggestions here, please?

6 months ago

0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

the components start hanging indefinitely right after printing Starting Task Execution

8 months ago

0 A Question Regarding Using

@<1576381444509405184:profile|ManiacalLizard2> but the task controller has access to that information. Before deleting the pod, it could retrieve the exit code and status message that all pods provide, and log it under "Info" section in ClearML.

3 months ago

0 Hi Everyone, I'M Experiencing An Issue With Clearml Running On K8S. After Upgrading The Clearml Server Helm Chart From Version 7.11.5, I'M Seeing The Following Errors: In The Agent:

Will do

8 months ago

Hey! That sounds reassuring, thanks for the response. BTW, I didn’t mean to criticize your engineers or anything, I can see they work very hard. Kudos to them.

12 days ago

0 Hi Everyone! Did Anyone Have This Issue With Clearml Agent In K8S When Trying To Run A Task Remotely?

None

8 months ago

0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

Huh, I see. Thanks for your answers. How difficult would it be to implement some way to automatically inferring repository information for components, or having a flag repo_inherit (or similar) when defining a component (which would inhering repository information from the controller)? My workflow is based around executing code that lives in the same repository, so it’s cumbersome having to specify repository information all over the place, and changing commit hash as I add new code.

8 months ago

0 Hi Everyone, I'M Experiencing An Issue With Clearml Running On K8S. After Upgrading The Clearml Server Helm Chart From Version 7.11.5, I'M Seeing The Following Errors: In The Agent:

@<1523701205467926528:profile|AgitatedDove14> for me it hasn’t worked when I specified agentk8sglue.queue: "queue1,queue2" in the Helm chart options which should be possible according to documentation. What also hasn’t worked is that flag for creating a queue if it doesn’t exists ( agentk8sglue.createQueueIfNotExists ). Both failed parsing at runtime, so those are 2 bugs I’d say.

8 months ago

0 Hello, I'M Having Issues With Cloning A Private Repository That Uses Submodules With Private Repositories. I'M Using

Deployment is using k8s ( docker.io/allegroai/clearml:2.0.0-613 )

7 months ago

0 How Can I Access Task Ids Of Tasks Running Within A

Yes, that seems like an option as well. I found this as well (in case someone looks for it in the future):

p = PipelineDecorator.get_current_pipeline()
p.get_running_nodes()

8 months ago

0 A Question Regarding Using

Logging the pod exit code and status message would be very useful, before deleting the pod. The data scientists would see that an OOM happened and they wouldn't bother other teams to see what happened.

3 months ago

The way I understand it:

if you’re executing tasks locally (e.g. on your laptop) then you need this setting because the clearml package needs to know where to upload artifacts (artifacts aren’t proxied through the clearml-server they are rather uploaded directly to the storage of your choice)
if you’re executing code using ClearML agent, then you can configure agent the way I wrote earlier, and it will use your MinIO instance for uploading artifacts for all of the tasks it executes

8 months ago

Any ideas @<1523701087100473344:profile|SuccessfulKoala55> ?

8 months ago

I don’t use datasets so I don’t know, sorry, maybe @<1523701087100473344:profile|SuccessfulKoala55> can help

8 months ago

0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

when I add repo="." to definition of all my component decorators it works (but not the pipeline decorator), but it doesn’t work without that part… the problem i’m having now is that my components hang when executed in the cluster… i have 2 agents deployed (default and services queues)

8 months ago

0 How To Configure Clearml Agent To Keep Pods Around After They Finish/Fail? I Want To Debug A Pod That Crashes, But It Gets Deleted Quickly

Yes @<1523701070390366208:profile|CostlyOstrich36>

6 months ago

0 How To Configure Clearml Agent To Keep Pods Around After They Finish/Fail? I Want To Debug A Pod That Crashes, But It Gets Deleted Quickly

Awesome @<1729671499981262848:profile|CooperativeKitten94> , will definitely add that. It would also be very helpful if there was a way to delay deleting "completed/failed" pods. This is useful when something fails unexpectedly and ClearML logs are not enough to debug the issue. Does that make sense to you? I could contribute to your codebase if you're interested.

6 months ago

0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

I think so, but haven’t investigated what is the problem exactly, I’ll report it though.

8 months ago

0 Hi Everyone, I'M Experiencing An Issue With Clearml Running On K8S. After Upgrading The Clearml Server Helm Chart From Version 7.11.5, I'M Seeing The Following Errors: In The Agent:

This hasn’t worked for me either, I use multiple queues instead. Another reason I also use multiple queues is because I need to specify different resource requirements for pods launched by each queue (CPU-only vs GPU).

8 months ago

0 Hi Everyone, I'M Experiencing An Issue With Clearml Running On K8S. After Upgrading The Clearml Server Helm Chart From Version 7.11.5, I'M Seeing The Following Errors: In The Agent:

Might be this None

8 months ago

0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

@<1523701205467926528:profile|AgitatedDove14> I managed to fix the issue FYI. I replaced from clearml import PipelineDecorator with from clearml.automation.controller import PipelineDecorator and it suddenly works. What a weird issue.

8 months ago

Show more results