Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
GiganticMole91
Moderator
19 Questions, 51 Answers
  Active since 10 January 2023
  Last activity 2 months ago

Reputation

0

Badges 1

50 × Eureka!
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
I have an issue with how clearml logs checkpoints. We have a training setup with pytorch-lightning + clearml, where we use lightning.pytorch.ModelCheckpoint ...
one year ago
0 Votes
0 Answers
471 Views
0 Votes 0 Answers 471 Views
It seems to be related to elastisearch clearml-elastic | "stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed", I...
9 months ago
0 Votes
7 Answers
621 Views
0 Votes 7 Answers 621 Views
Hi, I'm using Task.register_abort_callback to store the latest model checkpoint, but the ergonomics of the callback feel weird to me. I have to do these work...
8 months ago
0 Votes
18 Answers
2K Views
0 Votes 18 Answers 2K Views
2 years ago
0 Votes
3 Answers
251 Views
0 Votes 3 Answers 251 Views
Hi all, I would like to use clearml-serving to serve model binaries (for use in on-device deployment). Can clearml-serving be used to serve that?
2 months ago
0 Votes
2 Answers
280 Views
0 Votes 2 Answers 280 Views
2 months ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hi guys, I'm trying to familiarize myself with Hyperparameter Optimization using ClearML. It seems like there is a discrepancy between clearml-param-search C...
2 years ago
0 Votes
8 Answers
660 Views
0 Votes 8 Answers 660 Views
Rolling back to 1.15.0 seemed to fix the error for now. Is there something one should be aware of between server versions 1.15 and 1.16 related to versions o...
9 months ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hi all, Is there a way to force an agent to use https although the scheduled task is using ssh for git?
one year ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
2 years ago
0 Votes
1 Answers
566 Views
0 Votes 1 Answers 566 Views
Hi, is there anyone in the ClearML team that would like to review my PR on clearml-agent? I’m worried that it might have slipped under the radar. It adds sup...
7 months ago
0 Votes
0 Answers
660 Views
0 Votes 0 Answers 660 Views
Hi I just updated our server to the latest version, but it seems to have broken all our running experiments. Scalars is totally down, I just get this error w...
9 months ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hi guys. I'm struggling to get the Cleanup Service working on our on-prem setup. We are using the built in service ( None ) but see loads of errors like: Cou...
one year ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
one year ago
0 Votes
9 Answers
2K Views
0 Votes 9 Answers 2K Views
Hey, We're seeing a lot of issues with our ClearML self-hosted server these days; it seems like the API times out while talking to elasticsearch: 2022-10-22 ...
2 years ago
0 Votes
6 Answers
2K Views
0 Votes 6 Answers 2K Views
Hi guys, Is there a way, analogous to using Task.set_credentials(...) , to set credentials for storage programmatically? Like, Task.setup_storage(...) ? I'm ...
2 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
2 years ago
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
2 years ago
0 Votes
15 Answers
738 Views
0 Votes 15 Answers 738 Views
8 months ago
0 Hi All, I Would Like To Use Clearml-Serving To Serve Model Binaries (For Use In On-Device Deployment). Can Clearml-Serving Be Used To Serve That?

@<1523701070390366208:profile|CostlyOstrich36> any thoughts? Are the model files themselves easier to serve?

2 months ago
0 Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The

Sorry for the late reply @<1722061389024989184:profile|ResponsiveKoala38> . So this is the diff between my local version (hosted together on a single server with docker-compose). Does anything spring to mind?

9 months ago
0 Hi All, Is There A Way To Force An Agent To Use Https Although The Scheduled Task Is Using Ssh For Git?

Hi Martin,
It doesn't seem to work with dev.azure though:

Using user/pass credentials - replacing ssh url 'git@ssh.dev.azure.com:v3/ORG/TEAM/PROJECT' with https url '
'
fatal: repository '
' not found

The expected format for the https protocol is None .
Thoughts @<1523701205467926528:profile|AgitatedDove14> ?

one year ago
0 Hi All. I'M Setting Up An Model Export Script That Will Export Trained Models For Edge Deployment. I Initially Thought About Setting It Up As A Trigger Scheduler, And To Have It Trigger On Tags On A Published Model, But As Time Goes By The Trigger Schedul

Just wanted to share a workaround for using a TriggerScheduler to execute a script using the latest commit of a given branch, without relying on cloning a Task. Don't know if it has been shown before in here πŸ™‚

from clearml import Model, Task
from clearml.automation import TriggerScheduler

def trigger_model_func(model_id: str):
    model = Model(model_id)

    print(f"Triggered model export for model '{model.name}' ({model_id})")

    # NOTE: To execute from the branch of
    # task...
one year ago
0 Hi Guys, I'M Trying To Familiarize Myself With Hyperparameter Optimization Using Clearml. It Seems Like There Is A Discrepancy Between

Hi CostlyOstrich36
I have created a base task on which I'm optimizing hyperparameters. With clearml-param-search I could use --params-override to set a static parameter, which should not be optimized, e.g. changing the number of epochs for all experiments. It seems to me that this capability is not present in HyperParameterOptimizer . Does that make sense?

From the example on https://clear.ml/docs/latest/docs/apps/clearml_param_search/ :
` clearml-param-search {...} --p...

2 years ago
0 Hi, I Have Some Questions About Hyperparameter Optimization. We Have A Setup Where We Use Pytorchlightning Cli With Clearml For Experiment Tracking And Hyperparameter Optimization. Now, All Our Configurations Are Config-File Based. Sometime We Have Linke

Hi CurvedHedgehog15 , thanks for replying!
I guess that one could modify the config with variable interpolation (similar to how it's done in YAML, e.g. ${encoder.layers} ) - however, it seems to be quite invasive to specify that in our trainer script 😞

2 years ago
0 Hi All. I'M Setting Up An Model Export Script That Will Export Trained Models For Edge Deployment. I Initially Thought About Setting It Up As A Trigger Scheduler, And To Have It Trigger On Tags On A Published Model, But As Time Goes By The Trigger Schedul

Well, one solution could be to say that models can only be exported from main/master and then have devops start a new trigger on PR completion. That would require some logic for stopping the existing TriggerScheduler, but that shouldn't be too difficult.

However, the most flexible solution would be to have some way of triggering the execution of a script in the parent task environment, something along the lines of clearml-agent build ... . I just can't wrap my head around triggering that ...

one year ago
0 Hi Guys, I'M Trying To Familiarize Myself With Hyperparameter Optimization Using Clearml. It Seems Like There Is A Discrepancy Between

Yeah, that makes sense. The only drawback is that you'll get a single point that all lines will go through in the Parallel Coordinates plot when the optimization finishes πŸ™‚

2 years ago
0 Hi, I Have Some Questions About Hyperparameter Optimization. We Have A Setup Where We Use Pytorchlightning Cli With Clearml For Experiment Tracking And Hyperparameter Optimization. Now, All Our Configurations Are Config-File Based. Sometime We Have Linke

Hi CostlyOstrich36
What I'm seeing is expected behavior:

In my toy example, I have a VAE which is defined by a YAML config file and parsed with PytorchLightning CLI. Part of the config defines the latent dimension (n_latents) and the number of input channels of the decoder (in_channels). These two values needs to be the same. When I just use the Lightning CLI, I can use variable interpolation with OmegaConf like this:
` class_path: mymodel.VAE
init_args:
{...}
bottleneck:
class_pat...

2 years ago
0 Hi Guys. I'M Struggling To Get The Cleanup Service Working On Our On-Prem Setup. We Are Using The Built In Service (

Hi @<1523701087100473344:profile|SuccessfulKoala55> , thanks for responding. I've found out that my first error came from cloning a super old version of the clean up task in the web UI πŸ˜„
I don't know about the other error, to me it looks like the task gets deleted before handling errors, but since an error occurred (some 404 stuff, maybe the files actually aren't there) when deleting some artifacts on the task, clearml tries to reload the task and fails, with the 400/201 or 400/101. ...

one year ago
0 Hi! I Have Noticed That Clearml-Elastic Container Consumes 32.82Gib Memory. This Seems

@<1576381444509405184:profile|ManiacalLizard2> what happens when ES hits the limit? Does it go OOM, or does the scalars loading just take a long time in the web-ui? And what about tasks putting scalars in the index?

2 months ago
0 Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

Sure. Really, I'm just using the default client:
# ClearML SDK configuration file
api {
web_server: http://server.azure.com:8080
api_server: http://server.azure.com:8008
files_server: http://server.azure.com:8081
credentials {
"access_key" = "..."
"secret_key" = "..."
}

}
sdk {
# ClearML - default SDK configuration

storage {
    cache {
        # Defaults to system temp folder / cache
        default_base_dir: "~/.clearml/c...
2 years ago
0 Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

Hey SweetBadger76 , thanks for answering. I'll check it out! Does that correspond to filling out azure.storage in the clearml.conf file?

And how do I ensure that the server can access the files from the blob storage?

2 years ago
0 Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

How does it look in the Web UI?

I just had a look, and they are visible under debug samples, but not under plots, as I had expected.
I thought that by using report_matplotlib_figure it would get grouped under plots? πŸ™‚

2 years ago
0 Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

Do you mean to the Web UI?

Yes that's what I meant, sorry I'm still coming to terms with ClearML terminology πŸ˜… . Is it possible to store the web app cloud access token serverside so we don't have to input it in the Web UI? πŸ™‚

2 years ago
0 I Have An Issue With How Clearml Logs Checkpoints. We Have A Training Setup With Pytorch-Lightning + Clearml, Where We Use

The lightning folks won't include new loggers anymore (since mid-2022, see None ) πŸ™‚

one year ago
0 Hi Guys, I'M Setting Up A Bunch Of Machines As Clearml Agents And Have Run Into An Issue With Caching. We Are Using Poetry For Python Dependency Management, So The Agents Are Configured To Use That Too, But They Are Not Caching The Venvs Between Tasks. Th

Specifically, this is what I get in the console log when the agent spins up a task:

Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv latent-features in /data/clearml/venvs-builds/3.9/task_repository/our-repo/.venv
Installing dependencies from lock file
2 years ago
0 Hi, I Have Some Questions About Hyperparameter Optimization. We Have A Setup Where We Use Pytorchlightning Cli With Clearml For Experiment Tracking And Hyperparameter Optimization. Now, All Our Configurations Are Config-File Based. Sometime We Have Linke

Hi again CostlyOstrich36 ,

I just wanted to share what ended up working for me. Basically I worked it out both for Hydra (thanks CurvedHedgehog15 ) and for PytorchLightningCLI.

So, for PL-CLI, I used this construct so we don't have to modify our training scripts based on our experiment tracker

` from pytorch_lightning.utilities.cli import LightningCLI
from clearml import Task

class MyCLI(LightningCLI):
def before_instantiate_classes(self) -> None:
# init the task
tas...

2 years ago
0 Hi All. I'M Setting Up An Model Export Script That Will Export Trained Models For Edge Deployment. I Initially Thought About Setting It Up As A Trigger Scheduler, And To Have It Trigger On Tags On A Published Model, But As Time Goes By The Trigger Schedul

Well, consider the case where you start the trigger scheduler on commit A, then you do some work that defines a new model and commit as commit B, train some model and now you want to export/deploy the model by publishing it and tagging it with some tag that triggers the export, as in your example. The scheduler will then fail, because the model is not implemented at commit A.

Anyways, I think I've solved it, I'll post the workaround when I get around to it πŸ™‚
You can create a task in the t...

one year ago
0 Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

Hi @<1523701070390366208:profile|CostlyOstrich36>
Is 87G a lot for an index? Enough that you would consider adding more RAM?

And also, how can I check that we are not storing scalars for deleted tasks? ClearML used to write a lot of errors in the cleanup script, although that seems to have been fixed in recent updates

8 months ago
0 Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

Any tips on how to check if we are storing data on deleted tasks? Maybe @<1722061389024989184:profile|ResponsiveKoala38> knows? Is there a field on each scalar that I can cross check with ClearML?

8 months ago
0 Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

@<1722061389024989184:profile|ResponsiveKoala38> cool, thanks! I guess it will then be straightforward to script then.

What is your gut feeling regarding the size of the index? Is 87G a lot for an elastisearch index?

8 months ago
0 Hi, I'M Using

This is an example of the console output of a task aborted via the webUI:

Epoch 1/29 ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 699/16945 0:04:53 β€’ 1:55:25 2.35it/s v_num: 0.000
2024-09-16 12:52:57,263 - clearml.Task - WARNING - ### TASK STOPPING - USER ABORTED - LAUNCHING CALLBACK (timeout 30.0 sec) ###
[2024-09-16 12:52:57,284][core.callbacks.model_checkpoint][INFO] - Marking task as `in_progress`
[2024-09-16 12:52:57,309][core.callbacks.model_checkpoint][INFO] - Saving last checkpoint...
8 months ago
0 Hi, I'M Using

I just tried and the result is the same. The other method only triggers on exceptions

8 months ago
0 Hi, I'M Using

This is on clearml v1.16.4

8 months ago
Show more results compactanswers