Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
WobblyFrog79
Moderator
9 Questions, 31 Answers
  Active since 08 February 2025
  Last activity one day ago

Reputation

0

Badges 1

21 × Eureka!
0 Votes
6 Answers
877 Views
0 Votes 6 Answers 877 Views
A question regarding using clearml-agent with k8s clusters. We use ClearML pipelines to train our models. The pods sometimes fail due to intermittent failure...
2 months ago
0 Votes
2 Answers
237 Views
0 Votes 2 Answers 237 Views
Is there a way to fail early if a task in a k8s pipeline references a queue that doesn't actually exist? We've had this happen by accident (typo), the pipeli...
one month ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
How to configure ClearML agent to keep pods around after they finish/fail? I want to debug a pod that crashes, but it gets deleted quickly
6 months ago
0 Votes
4 Answers
611 Views
0 Votes 4 Answers 611 Views
7 months ago
0 Votes
9 Answers
758 Views
0 Votes 9 Answers 758 Views
Hello, a question about pipelines. I have a repository with one pipeline using decorators, defined in pipeline.py . It uses multiple components that import c...
8 months ago
0 Votes
1 Answers
566 Views
0 Votes 1 Answers 566 Views
Hello, I'm having issues with cloning a private repository that uses submodules with private repositories. I'm using CLEARML_AGENT_GIT_PASS and CLEARML_AGENT...
6 months ago
0 Votes
1 Answers
39 Views
0 Votes 1 Answers 39 Views
Hello, I submitted a PR in July which still hasn't been merged, and I'm having trouble reaching the person who reviewed it initially. Can somebody from the C...
4 days ago
0 Votes
0 Answers
609 Views
0 Votes 0 Answers 609 Views
Is there a way to change the name of MongoDB databases used by ClearML? We want to have two self-hosted instances of ClearML that are going to use the same M...
5 months ago
0 Votes
3 Answers
636 Views
0 Votes 3 Answers 636 Views
How can I access task IDs of tasks running within a PipelineDecorator.pipeline ? I know PipelineController has get_running_nodes method, but how to achieve t...
7 months ago
0 Hello! I Am Setting Up A Clearml-Server With Self-Hosted Minio. Do I Would Like To Keep The Clearml.Conf As Default As Possible (Such That Users Do Not Need To Configure Much And Do Not Need Access To Mino Keys). I Am Trying To Use The Server-Config File

The way I understand it:

  • if you’re executing tasks locally (e.g. on your laptop) then you need this setting because the clearml package needs to know where to upload artifacts (artifacts aren’t proxied through the clearml-server they are rather uploaded directly to the storage of your choice)
  • if you’re executing code using ClearML agent, then you can configure agent the way I wrote earlier, and it will use your MinIO instance for uploading artifacts for all of the tasks it executes
7 months ago
0 How To Configure Clearml Agent To Keep Pods Around After They Finish/Fail? I Want To Debug A Pod That Crashes, But It Gets Deleted Quickly

Hey @<1523701070390366208:profile|CostlyOstrich36> , could you provide any suggestions here, please?

6 months ago
0 How To Configure Clearml Agent To Keep Pods Around After They Finish/Fail? I Want To Debug A Pod That Crashes, But It Gets Deleted Quickly

Awesome @<1729671499981262848:profile|CooperativeKitten94> , will definitely add that. It would also be very helpful if there was a way to delay deleting "completed/failed" pods. This is useful when something fails unexpectedly and ClearML logs are not enough to debug the issue. Does that make sense to you? I could contribute to your codebase if you're interested.

6 months ago
0 A Question Regarding Using

@<1576381444509405184:profile|ManiacalLizard2> but the task controller has access to that information. Before deleting the pod, it could retrieve the exit code and status message that all pods provide, and log it under "Info" section in ClearML.

2 months ago
0 A Question Regarding Using

@<1523701070390366208:profile|CostlyOstrich36> they don't as the pod is killed as soon as the process inside oversteps the memory limit

2 months ago
0 A Question Regarding Using

Logging the pod exit code and status message would be very useful, before deleting the pod. The data scientists would see that an OOM happened and they wouldn't bother other teams to see what happened.

2 months ago
0 A Question Regarding Using

I'm not talking about node failure, rather pod failure, which is out-of-memory in 99% of the cases.

2 months ago
0 Hi Everyone, I'M Experiencing An Issue With Clearml Running On K8S. After Upgrading The Clearml Server Helm Chart From Version 7.11.5, I'M Seeing The Following Errors: In The Agent:

@<1523701205467926528:profile|AgitatedDove14> for me it hasn’t worked when I specified agentk8sglue.queue: "queue1,queue2" in the Helm chart options which should be possible according to documentation. What also hasn’t worked is that flag for creating a queue if it doesn’t exists ( agentk8sglue.createQueueIfNotExists ). Both failed parsing at runtime, so those are 2 bugs I’d say.

7 months ago
0 Is There A Way To Fail Early If A Task In A K8S Pipeline References A Queue That Doesn'T Actually Exist? We'Ve Had This Happen By Accident (Typo), The Pipeline Just Kept Running Indefinitely.

@<1523701087100473344:profile|SuccessfulKoala55> my colleague submitted a pipeline whose component was referencing a non-existent queue. The queue doesn't actually exist, that's the issue. The "default" queue that handles the controller task just started to output error messages saying that this component can't be scheduled due to missing queue. We just want a way to fail early if a queue doesn't exist, instead of a pipeline running indefinitely without actually failing.

one month ago
0 Hi Everyone, I'M Experiencing An Issue With Clearml Running On K8S. After Upgrading The Clearml Server Helm Chart From Version 7.11.5, I'M Seeing The Following Errors: In The Agent:

This hasn’t worked for me either, I use multiple queues instead. Another reason I also use multiple queues is because I need to specify different resource requirements for pods launched by each queue (CPU-only vs GPU).

7 months ago
0 Hello, I Submitted A Pr In July Which Still Hasn'T Been Merged, And I'M Having Trouble Reaching The Person Who Reviewed It Initially. Can Somebody From The Clearml Team Review The Pr, So We Can Finally Move Forward? Thanks. Link ->

Hello @<1523703097560403968:profile|CumbersomeCormorant74> , I found your name on the company website, you're the VP of Engineering if I'm not mistaken? I wanted to directly ask you, since I'm having trouble reaching engineers on GitHub. What is your policy & process for OSS contributions? My team is a heavy user, and we occasionally find things to improve, but the experience for contributions hasn't been great so far. Thanks for making ClearML open-source!

one day ago
0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

Huh, I see. Thanks for your answers. How difficult would it be to implement some way to automatically inferring repository information for components, or having a flag repo_inherit (or similar) when defining a component (which would inhering repository information from the controller)? My workflow is based around executing code that lives in the same repository, so it’s cumbersome having to specify repository information all over the place, and changing commit hash as I add new code.

8 months ago
0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

@<1523701205467926528:profile|AgitatedDove14> I managed to fix the issue FYI. I replaced from clearml import PipelineDecorator with from clearml.automation.controller import PipelineDecorator and it suddenly works. What a weird issue.

7 months ago
0 How Can I Access Task Ids Of Tasks Running Within A

Yes, that seems like an option as well. I found this as well (in case someone looks for it in the future):

p = PipelineDecorator.get_current_pipeline()
p.get_running_nodes()
7 months ago
7 months ago
8 months ago
0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

when I add repo="." to definition of all my component decorators it works (but not the pipeline decorator), but it doesn’t work without that part… the problem i’m having now is that my components hang when executed in the cluster… i have 2 agents deployed (default and services queues)

8 months ago
0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

the components start hanging indefinitely right after printing Starting Task Execution

8 months ago
0 Hello, A Question About Pipelines. I Have A Repository With One Pipeline Using Decorators, Defined In

I think so, but haven’t investigated what is the problem exactly, I’ll report it though.

7 months ago
0 How Can I Access Task Ids Of Tasks Running Within A

Thanks @<1806497735218565120:profile|BrightJellyfish46>

7 months ago
Show more results compactanswers