GiganticMole91

19 Questions, 51 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

50 × Eureka!

Questions 19
Answers 51

0 Votes

15 Answers

1K Views

0 Votes 15 Answers 1K Views

Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

Hi there, Our. self-hosted server is periodically very slow to react in the web UI. We've been debugging for quite some time, and it would seem that elastise...

clearml

one year ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi Guys, Is There A Way, Analogous To Using

Hi guys, Is there a way, analogous to using Task.set_credentials(...) , to set credentials for storage programmatically? Like, Task.setup_storage(...) ? I'm ...

clearml

3 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi I Just Updated Our Server To The Latest Version, But It Seems To Have Broken All Our Running Experiments. Scalars Is Totally Down, I Just Get This Error When Going To The Scalars Tab:

Hi I just updated our server to the latest version, but it seems to have broken all our running experiments. Scalars is totally down, I just get this error w...

clearml

one year ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi Guys, I'M Trying To Familiarize Myself With Hyperparameter Optimization Using Clearml. It Seems Like There Is A Discrepancy Between

Hi guys, I'm trying to familiarize myself with Hyperparameter Optimization using ClearML. It seems like there is a discrepancy between clearml-param-search C...

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey, I'M Looking Into Clearml Pipelines For The First Time, So I Have Likely Not Fully Understood The Documentation Yet, But; Is There Any Way Where I Can Use Pipelines To Setup A Process That Will Run When An Experiment Is Published? Thanks :-)

Hey, I'm looking into ClearML Pipelines for the first time, so I have likely not fully understood the documentation yet, but; Is there any way where I can us...

clearml

2 years ago

0 Votes

3 Answers

771 Views

0 Votes 3 Answers 771 Views

Hi All, I Would Like To Use Clearml-Serving To Serve Model Binaries (For Use In On-Device Deployment). Can Clearml-Serving Be Used To Serve That?

Hi all, I would like to use clearml-serving to serve model binaries (for use in on-device deployment). Can clearml-serving be used to serve that?

clearml

7 months ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, Is There Anyone In The Clearml Team That Would Like To Review My Pr On Clearml-Agent? I’M Worried That It Might Have Slipped Under The Radar. It Adds Support For Using Uv As A Package Manager :-)

Hi, is there anyone in the ClearML team that would like to review my PR on clearml-agent? I’m worried that it might have slipped under the radar. It adds sup...

clearml

one year ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi Guys, I'M Setting Up A Bunch Of Machines As Clearml Agents And Have Run Into An Issue With Caching. We Are Using Poetry For Python Dependency Management, So The Agents Are Configured To Use That Too, But They Are Not Caching The Venvs Between Tasks. Th

Hi guys, I'm setting up a bunch of machines as clearml agents and have run into an issue with caching. We are using Poetry for python dependency management, ...

clearml

2 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I'M Using

Hi, I'm using Task.register_abort_callback to store the latest model checkpoint, but the ergonomics of the callback feel weird to me. I have to do these work...

pytorch

one year ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Hey, We'Re Seeing A Lot Of Issues With Our Clearml Self-Hosted Server These Days; It Seems Like The Api Times Out While Talking To Elasticsearch:

Hey, We're seeing a lot of issues with our ClearML self-hosted server these days; it seems like the API times out while talking to elasticsearch: 2022-10-22 ...

clearml

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The

Rolling back to 1.15.0 seemed to fix the error for now. Is there something one should be aware of between server versions 1.15 and 1.16 related to versions o...

clearml

one year ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Have Some Questions About Hyperparameter Optimization. We Have A Setup Where We Use Pytorchlightning Cli With Clearml For Experiment Tracking And Hyperparameter Optimization. Now, All Our Configurations Are Config-File Based. Sometime We Have Linke

Hi, I have some questions about hyperparameter optimization. We have a setup where we use PytorchLightning CLI with ClearML for experiment tracking and hyper...

clearml

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi All, Is There A Way To Force An Agent To Use Https Although The Scheduled Task Is Using Ssh For Git?

Hi all, Is there a way to force an agent to use https although the scheduled task is using ssh for git?

mlops

one year ago

0 Votes

0 Answers

739 Views

0 Votes 0 Answers 739 Views

It Seems To Be Related To Elastisearch

It seems to be related to elastisearch clearml-elastic | "stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed", I...

clearml

one year ago

0 Votes

2 Answers

856 Views

0 Votes 2 Answers 856 Views

Hi, We Are Storing Lots And Lots Of Scalars On Worker Resource And Queue Stats. This Totals To About 29 Gb. Is There Any Way Of Limiting The Amount Of Stats Being Logged? We Are Not Very Interested In The Worker Stats And 11G For Just Monitoring The Queue

Hi, we are storing lots and lots of scalars on worker resource and queue stats. This totals to about 29 gb. Is there any way of limiting the amount of stats ...

clearml

8 months ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hi All. I'M Setting Up An Model Export Script That Will Export Trained Models For Edge Deployment. I Initially Thought About Setting It Up As A Trigger Scheduler, And To Have It Trigger On Tags On A Published Model, But As Time Goes By The Trigger Schedul

Hi all. I'm setting up an model export script that will export trained models for edge deployment. I initially thought about setting it up as a trigger sched...

clearml

one year ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

I Have An Issue With How Clearml Logs Checkpoints. We Have A Training Setup With Pytorch-Lightning + Clearml, Where We Use

I have an issue with how clearml logs checkpoints. We have a training setup with pytorch-lightning + clearml, where we use lightning.pytorch.ModelCheckpoint ...

clearml

one year ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi Guys. I'M Struggling To Get The Cleanup Service Working On Our On-Prem Setup. We Are Using The Built In Service (

Hi guys. I'm struggling to get the Cleanup Service working on our on-prem setup. We are using the built in service ( None ) but see loads of errors like: Cou...

clearml

one year ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

Hi guys, I'm in the process of setting up a ClearML server for experiment tracking. I have the server hosted in a virtual Linux machine on Azure and run expe...

clearml

3 years ago

0 Hi All. I'M Setting Up An Model Export Script That Will Export Trained Models For Edge Deployment. I Initially Thought About Setting It Up As A Trigger Scheduler, And To Have It Trigger On Tags On A Published Model, But As Time Goes By The Trigger Schedul

Just wanted to share a workaround for using a TriggerScheduler to execute a script using the latest commit of a given branch, without relying on cloning a Task. Don't know if it has been shown before in here 🙂

from clearml import Model, Task
from clearml.automation import TriggerScheduler

def trigger_model_func(model_id: str):
    model = Model(model_id)

    print(f"Triggered model export for model '{model.name}' ({model_id})")

    # NOTE: To execute from the branch of
    # task...

one year ago

0 Hi Guys, I'M Trying To Familiarize Myself With Hyperparameter Optimization Using Clearml. It Seems Like There Is A Discrepancy Between

Hi CostlyOstrich36
I have created a base task on which I'm optimizing hyperparameters. With clearml-param-search I could use --params-override to set a static parameter, which should not be optimized, e.g. changing the number of epochs for all experiments. It seems to me that this capability is not present in HyperParameterOptimizer . Does that make sense?

From the example on https://clear.ml/docs/latest/docs/apps/clearml_param_search/ :
` clearml-param-search {...} --p...

3 years ago

0 Hi Guys, Is There A Way, Analogous To Using

Hi CostlyOstrich36 , thanks for answering. We are using compute instances through the Machine Learning Studio in Azure. They basically work by spinning up an instance, loading a docker-image and executing a specific script in a folder that you upload along with the docker-image. Nothing is persisted between runs and there is no clear notion of a "user" (when thinking of ~/.clearml.conf at least).

SuccessfulKoala55 yeah, sorry, should have mentioned that our storage is also Azure (blob sto...

3 years ago

0 Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The

Sure. I'll give it a few minor releases and then try again 🙂 Thanks for the responses @<1722061389024989184:profile|ResponsiveKoala38> !

one year ago

0 Hi Guys, I'M Trying To Familiarize Myself With Hyperparameter Optimization Using Clearml. It Seems Like There Is A Discrepancy Between

Yeah, that makes sense. The only drawback is that you'll get a single point that all lines will go through in the Parallel Coordinates plot when the optimization finishes 🙂

3 years ago

0 Hey, I'M Looking Into Clearml Pipelines For The First Time, So I Have Likely Not Fully Understood The Documentation Yet, But; Is There Any Way Where I Can Use Pipelines To Setup A Process That Will Run When An Experiment Is Published? Thanks :-)

I think that you are absolutely correct. Thanks for the pointer!

2 years ago

0 Hey, We'Re Seeing A Lot Of Issues With Our Clearml Self-Hosted Server These Days; It Seems Like The Api Times Out While Talking To Elasticsearch:

CostlyOstrich36 any thought on how we can further debug this? It's making ClearML practically useless for us

3 years ago

0 I Have An Issue With How Clearml Logs Checkpoints. We Have A Training Setup With Pytorch-Lightning + Clearml, Where We Use

Thanks for responding @<1523701087100473344:profile|SuccessfulKoala55> . Good question! One solution could be to create a new open-source project with lightning + clearml integrations and link it to the Lightning ecosystem-ci ; I believe most people use the basic tensorboard-logger with ClearML, but the extended usecase of a ClearML model checkpoint callback might make it valuable.

I guess one would have to disable auto-logging of p...

one year ago

0 Hi Guys, I'M Setting Up A Bunch Of Machines As Clearml Agents And Have Run Into An Issue With Caching. We Are Using Poetry For Python Dependency Management, So The Agents Are Configured To Use That Too, But They Are Not Caching The Venvs Between Tasks. Th

Specifically, this is what I get in the console log when the agent spins up a task:

Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv latent-features in /data/clearml/venvs-builds/3.9/task_repository/our-repo/.venv
Installing dependencies from lock file

2 years ago

0 Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The

It's running v7.17.18 @<1722061389024989184:profile|ResponsiveKoala38>

one year ago

0 Hi, I Have Some Questions About Hyperparameter Optimization. We Have A Setup Where We Use Pytorchlightning Cli With Clearml For Experiment Tracking And Hyperparameter Optimization. Now, All Our Configurations Are Config-File Based. Sometime We Have Linke

I don't have issues with setting the hyperparameters - I just would like to link changes to one hyperparameter (eg. encoder.layers ) to another parameter (e.g. http://decoder.in _layers ) when optimizing over encoder.layer

3 years ago

0 Hi All, I Would Like To Use Clearml-Serving To Serve Model Binaries (For Use In On-Device Deployment). Can Clearml-Serving Be Used To Serve That?

but is model files easier to serve?

7 months ago

0 Hi, I'M Using

Hi @<1523701070390366208:profile|CostlyOstrich36> , the task is being aborted via the web UI - I have another method that catches local interrupts (exceptions like keyboard interrupts and crashes). The case is equal for running tasks via agents or just local cli

one year ago

0 Hi, I'M Using

@<1523701070390366208:profile|CostlyOstrich36> just opened an issue on this: None

one year ago

0 Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The

Sorry for the late reply @<1722061389024989184:profile|ResponsiveKoala38> . So this is the diff between my local version (hosted together on a single server with docker-compose). Does anything spring to mind?

one year ago

0 I Have An Issue With How Clearml Logs Checkpoints. We Have A Training Setup With Pytorch-Lightning + Clearml, Where We Use

The lightning folks won't include new loggers anymore (since mid-2022, see None ) 🙂

one year ago

0 Hi Guys. I'M Struggling To Get The Cleanup Service Working On Our On-Prem Setup. We Are Using The Built In Service (

Sorry, I got caught up by other tasks. I might investigate further later, but it's not top of mind right now. Our main issue is to get people to archive their old tasks and models so they can be cleaned up 😄

one year ago

0 Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The

diff --git a/docker-compose.yml b/docker-compose.diff.yml
index c6b49e1..07f7f43 100644
--- a/docker-compose.yml
+++ b/docker-compose.diff.yml
@@ -5,7 +5,7 @@ services:
     command:
     - apiserver
     container_name: clearml-apiserver
-    image: allegroai/clearml:1.15.0
+    image: allegroai/clearml:latest
     restart: unless-stopped
     volumes:
     - /opt/clearml/logs:/var/log/clearml
@@ -19,17 +19,18 @@ services:
     environment:
       CLEARML_ELASTIC_SERVICE_HOST: elastics...

one year ago

0 Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

Hi @<1523701070390366208:profile|CostlyOstrich36>
Is 87G a lot for an index? Enough that you would consider adding more RAM?

And also, how can I check that we are not storing scalars for deleted tasks? ClearML used to write a lot of errors in the cleanup script, although that seems to have been fixed in recent updates

one year ago

Hi CurvedHedgehog15 , thanks for replying!
I guess that one could modify the config with variable interpolation (similar to how it's done in YAML, e.g. ${encoder.layers} ) - however, it seems to be quite invasive to specify that in our trainer script 😞

3 years ago

0 Hi All, I Would Like To Use Clearml-Serving To Serve Model Binaries (For Use In On-Device Deployment). Can Clearml-Serving Be Used To Serve That?

@<1523701070390366208:profile|CostlyOstrich36> any thoughts? Are the model files themselves easier to serve?

7 months ago

0 Hi, I'M Using

I just tried and the result is the same. The other method only triggers on exceptions

one year ago

Well, one solution could be to say that models can only be exported from main/master and then have devops start a new trigger on PR completion. That would require some logic for stopping the existing TriggerScheduler, but that shouldn't be too difficult.

However, the most flexible solution would be to have some way of triggering the execution of a script in the parent task environment, something along the lines of clearml-agent build ... . I just can't wrap my head around triggering that ...

one year ago

Which version of the server are you running?

one year ago

@<1722061389024989184:profile|ResponsiveKoala38> cool, thanks! I guess it will then be straightforward to script then.

What is your gut feeling regarding the size of the index? Is 87G a lot for an elastisearch index?

one year ago

Any tips on how to check if we are storing data on deleted tasks? Maybe @<1722061389024989184:profile|ResponsiveKoala38> knows? Is there a field on each scalar that I can cross check with ClearML?

one year ago

Yes, I tried updating recently, it costed me a full days work of rolling back versions until I found something that worked 😅

one year ago

0 Hi All, Is There A Way To Force An Agent To Use Https Although The Scheduled Task Is Using Ssh For Git?

Hi Martin,
It doesn't seem to work with dev.azure though:

Using user/pass credentials - replacing ssh url 'git@ssh.dev.azure.com:v3/ORG/TEAM/PROJECT' with https url '

'
fatal: repository '

' not found

The expected format for the https protocol is None .
Thoughts @<1523701205467926528:profile|AgitatedDove14> ?

one year ago

0 Hey, We'Re Seeing A Lot Of Issues With Our Clearml Self-Hosted Server These Days; It Seems Like The Api Times Out While Talking To Elasticsearch:

No, not at all. I recon we started seeing errors around mid-last week. We are using default settings for everything except some password-stuff on the server.

3 years ago

Well, consider the case where you start the trigger scheduler on commit A, then you do some work that defines a new model and commit as commit B, train some model and now you want to export/deploy the model by publishing it and tagging it with some tag that triggers the export, as in your example. The scheduler will then fail, because the model is not implemented at commit A.

Anyways, I think I've solved it, I'll post the workaround when I get around to it 🙂
You can create a task in the t...

one year ago

Show more results