Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
49 Questions, 8056 Answers
  Active since 10 January 2023
  Last activity 9 months ago

Reputation

0

Badges 1

25 × Eureka!
0 Hi, I’M Training On Multi-Node, Clearml Captures Only A Single Machine Utility (Memory/Cpu/Etc.). I Assume It Captures Node 0. Is There A Way To Make It Report All Nodes?

I think prefix would be great. It can also make it easier for reporting scalars in general

Actually those are "supposed" to be collected automatically by pytorch and reported by the master node.

currently we need a barrier to sync all nodes before reporting a scalar which makes it slower.

Also "should" be part of pytorch ddp

It's launched with torchrun

I know there is an integration with torchrun (the under the hood infrastructure) effort, I'm not sure where it stands....

one year ago
0 Hello Everyone. I'Ve Just Started Playing With Clearml. In The 2Nd 'Getting Started' Tutorial, I Launched The Agent From Google Colab. But Whenever A Task Is Picked, It Fails For The Following Error. Any Clues? Thank You!

Hi @<1686547344096497664:profile|ContemplativeArcticwolf43>

In the 2nd 'Getting Started' tutorial,

Could you send a link to the specific notebook?

. But whenever a task is picked, it fails for the following

You mean after the Task.init call?

9 months ago
0 Hey Community! I Have A Question Regarding The Optuna Optimizer With Clearml. I'M Using A Config Yaml File That I'M Connecting Via

Well it should work out if the box as long as you have the full route, i.e. Section/param

one year ago
0 Can Anyone Recommend A Good Workflow For

But this config should almost never need to change!

Exactly the idea πŸ™‚
notice the password (initially random) is also fixed on your local machine, for the exact same reason

one year ago
0 Can Anyone Recommend A Good Workflow For

exactly! it is very cool to see it in action, and it really works very well, kudos for these guys

one year ago
0 Hello! Getting Credential Errors When Attempting To Pip Install Transformers From Git Repo, On A Gpu Queue.

Hi SmallDeer34
I need some help what is the difference between the manual one and the automatic one ?
from your previous log, this is the bash command executed inside the container, can you try to "step by step" try to catch who/what is messing it up ?
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/...

3 years ago
0 Hi All! I Have A Question About Pipelines. My Pipeline Consists Of Several Steps:

because step can be constructed with multiple

sub-components

but not all of them might be added to the UI graph

Just to make sure I fully understand when we decorate with @sub_node we want that to also appear in the UI graph (and have it's own Task / metrics etc)
correct?

2 years ago
0 Hi, Is There A Way To Get The Quota Used By Each Task? My "Metrics" Quota Is Filling Up Very Quickly And I Would Like To Understand What'S Causing It.

Hi @<1570220858075516928:profile|SlipperySheep79>
I think this is more complicated than one would expect. But as a rule of thumb, console logs and metrics are the main ones. I hope it helps? Maybe sort by number of iterations in the experiment table ?

BTW: probable better to ask in channel

one year ago
0 Second: Is There A Way To Take Internally Tracked Training Runs And Publish Them Publicly, E.G. For A Research Paper? "Appendix A: Training Runs Can Be Found Here, Feel Free To Explore Them And Look At The Loss Curves"? For Example

Hi SmallDeer34
On the SaaS you can right click on an experimenter and publish it πŸ™‚
This will make the link available for everyone, would that help?

2 years ago
0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

If the load balancer it Gateway can do the computation and leverage caching,

Oh that's True. But unfortunately out of scope for the open-source (well at the end someone needs to pay our salaries πŸ™‚ )

I’d prefer not to have our EC2 instance directly exposed to the public Internet.

Yep, I tend to agree πŸ™‚

one year ago
0 Hi All, After Solving My Multiprocessing Issue I'Ve Found The Following Issue: I Have A Machine With 2 Gpus. Starting An Agent There Specifying

PompousBeetle71 , These are cuda versions, I'm looking for the nvidia driver version for example 440.xx or 418.xx .
The reason is, we set an OS environment for the driver, and I remember that old drivers did not support it . Basically they do not support NVIDIA_VISIBLE_DEVICES=all , so I'm trying to see if that's the case, then we could add fix .

4 years ago
0 Can Anyone Recommend A Good Workflow For

I'm guessing this is done through code-server?

correct

I'm currently rolling a JupyterHub instance (multiuser, with codeserver inside) on the same machine as clearml-server. That’s where tasks are executed etc. so, all browser dev env.

Yeah, the idea with clearml-session each user can self serve themselves the container that works best for them. With a jupyterhub they start to step on each other's toes very quickly ...

one year ago
0 Hey Guys, Sorry For The Rapid Fire Questions In The Past Few Days. I Have Another Issue Though. I Initially Ran A Task, Directly From A Repo. It Succesfully Installed The Requirements From The Requirements File In The Repo And Ran The Task Without Any Iss

It runs into the above error when I clone the task or reset it.

from here:

AssertionError: ERROR: --resume checkpoint does not exist

I assume the "internal" code state changed, and now it is looking for a file that does not exist, how would your code state change, in other words why would it be looking for the file only when cloning? could it be you put the state on the Task, then you clone it (i.e. clone the exact same dict, and now the newly cloned Task "thinks" it resuming ?!)

3 years ago
0 Hi, Where Can I Find Documentation Of The Full, Paid Version Of Allegro? (Including The Data Management Section)

Hi PlainSquid19
Did you check the website https://allegro.ai ?
If you need more info I would just fill-in the contact info, I'm sure the sales guys will get back to you soon πŸ™‚

4 years ago
4 years ago
0 Is There Any Way To Clear The Installed Packages Of A Task Programmatically? (I.E. Using The Python Sdk And Not The Ui)

Hi GiddyTurkey39
Are you referring to an already executed Task or the current running one?
(Also, what is the use case here? is it because the "installed packages are in accurate?)

4 years ago
0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

Depends on what you want to do, what do you want to do ?

3 years ago
0 Hi All, I'M Trying To Deploy Trains On Rancher (Nice Kubernetes Cluster Orchestration Project) Where I'M Quite New To Rancher And Kubernetes. I Have Been Able To Install Trains Using Helm

WickedGoat98

I will try to collect the installation steps in a document and share it to the community once ready

Thank you! this will be awesome !

We're here if you need anything πŸ™‚

4 years ago
0 Hi

Hi SmugTurtle78
Unfortunately there is no actual filtering for these logs, because they are so important for debugging and visibility. I have to ask, what's the use case to remove some of them ?

one year ago
0 Hi Guys, Is There A Way To Timeout (From Clearml) A Task If That Running Too Long?

Hi @<1523701260895653888:profile|QuaintJellyfish58>
You mean some "daemon service" aborting Tasks that do not end after X hours? or is it based on CPU/GPU utilization?

one year ago
Show more results compactanswers