Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
GiganticMole91
Moderator
17 Questions, 47 Answers
  Active since 10 January 2023
  Last activity one month ago

Reputation

0

Badges 1

47 × Eureka!
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi guys, Is there a way, analogous to using Task.set_credentials(...) , to set credentials for storage programmatically? Like, Task.setup_storage(...) ? I'm ...
2 years ago
0 Votes
18 Answers
1K Views
0 Votes 18 Answers 1K Views
2 years ago
0 Votes
4 Answers
682 Views
0 Votes 4 Answers 682 Views
Hi guys. I'm struggling to get the Cleanup Service working on our on-prem setup. We are using the built in service ( None ) but see loads of errors like: Cou...
10 months ago
0 Votes
12 Answers
676 Views
0 Votes 12 Answers 676 Views
9 months ago
0 Votes
8 Answers
121 Views
0 Votes 8 Answers 121 Views
Rolling back to 1.15.0 seemed to fix the error for now. Is there something one should be aware of between server versions 1.15 and 1.16 related to versions o...
2 months ago
0 Votes
15 Answers
97 Views
0 Votes 15 Answers 97 Views
one month ago
0 Votes
3 Answers
657 Views
0 Votes 3 Answers 657 Views
I have an issue with how clearml logs checkpoints. We have a training setup with pytorch-lightning + clearml, where we use lightning.pytorch.ModelCheckpoint ...
9 months ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
one year ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
one year ago
0 Votes
4 Answers
969 Views
0 Votes 4 Answers 969 Views
Hi guys, I'm trying to familiarize myself with Hyperparameter Optimization using ClearML. It seems like there is a discrepancy between clearml-param-search C...
2 years ago
0 Votes
4 Answers
738 Views
0 Votes 4 Answers 738 Views
Hi all, Is there a way to force an agent to use https although the scheduled task is using ssh for git?
9 months ago
0 Votes
9 Answers
1K Views
0 Votes 9 Answers 1K Views
Hey, We're seeing a lot of issues with our ClearML self-hosted server these days; it seems like the API times out while talking to elasticsearch: 2022-10-22 ...
2 years ago
0 Votes
7 Answers
88 Views
0 Votes 7 Answers 88 Views
Hi, I'm using Task.register_abort_callback to store the latest model checkpoint, but the ergonomics of the callback feel weird to me. I have to do these work...
one month ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
2 years ago
0 Votes
0 Answers
147 Views
0 Votes 0 Answers 147 Views
Hi I just updated our server to the latest version, but it seems to have broken all our running experiments. Scalars is totally down, I just get this error w...
2 months ago
0 Votes
0 Answers
102 Views
0 Votes 0 Answers 102 Views
It seems to be related to elastisearch clearml-elastic | "stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed", I...
2 months ago
0 Votes
1 Answers
12 Views
0 Votes 1 Answers 12 Views
Hi, is there anyone in the ClearML team that would like to review my PR on clearml-agent? I’m worried that it might have slipped under the radar. It adds sup...
one day ago
0 Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The
diff --git a/docker-compose.yml b/docker-compose.diff.yml
index c6b49e1..07f7f43 100644
--- a/docker-compose.yml
+++ b/docker-compose.diff.yml
@@ -5,7 +5,7 @@ services:
     command:
     - apiserver
     container_name: clearml-apiserver
-    image: allegroai/clearml:1.15.0
+    image: allegroai/clearml:latest
     restart: unless-stopped
     volumes:
     - /opt/clearml/logs:/var/log/clearml
@@ -19,17 +19,18 @@ services:
     environment:
       CLEARML_ELASTIC_SERVICE_HOST: elastics...
2 months ago
0 Hi, I'M Using

@<1523701070390366208:profile|CostlyOstrich36> just opened an issue on this: None

one month ago
0 Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The

Sorry for the late reply @<1722061389024989184:profile|ResponsiveKoala38> . So this is the diff between my local version (hosted together on a single server with docker-compose). Does anything spring to mind?

2 months ago
0 Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

@<1722061389024989184:profile|ResponsiveKoala38> cool, thanks! I guess it will then be straightforward to script then.

What is your gut feeling regarding the size of the index? Is 87G a lot for an elastisearch index?

one month ago
0 Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

Any tips on how to check if we are storing data on deleted tasks? Maybe @<1722061389024989184:profile|ResponsiveKoala38> knows? Is there a field on each scalar that I can cross check with ClearML?

one month ago
0 Hi, I'M Using

This is on clearml v1.16.4

one month ago
0 I Have An Issue With How Clearml Logs Checkpoints. We Have A Training Setup With Pytorch-Lightning + Clearml, Where We Use

The lightning folks won't include new loggers anymore (since mid-2022, see None ) πŸ™‚

9 months ago
0 Hey, We'Re Seeing A Lot Of Issues With Our Clearml Self-Hosted Server These Days; It Seems Like The Api Times Out While Talking To Elasticsearch:

No, not at all. I recon we started seeing errors around mid-last week. We are using default settings for everything except some password-stuff on the server.

2 years ago
0 Hey, We'Re Seeing A Lot Of Issues With Our Clearml Self-Hosted Server These Days; It Seems Like The Api Times Out While Talking To Elasticsearch:

CostlyOstrich36 any thought on how we can further debug this? It's making ClearML practically useless for us

2 years ago
0 Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

Sure. Really, I'm just using the default client:
# ClearML SDK configuration file
api {
web_server: http://server.azure.com:8080
api_server: http://server.azure.com:8008
files_server: http://server.azure.com:8081
credentials {
"access_key" = "..."
"secret_key" = "..."
}

}
sdk {
# ClearML - default SDK configuration

storage {
    cache {
        # Defaults to system temp folder / cache
        default_base_dir: "~/.clearml/c...
2 years ago
0 Hi, I'M Using

I just tried and the result is the same. The other method only triggers on exceptions

one month ago
0 Hi Guys, Is There A Way, Analogous To Using

Perfect! Thanks SuccessfulKoala55 , that would be an acceptable workaround until setup_upload also supports Azure πŸ™‚ πŸ™Œ

2 years ago
0 Hi Guys, I'M Trying To Familiarize Myself With Hyperparameter Optimization Using Clearml. It Seems Like There Is A Discrepancy Between

Yeah, that makes sense. The only drawback is that you'll get a single point that all lines will go through in the Parallel Coordinates plot when the optimization finishes πŸ™‚

2 years ago
0 Hi, I Have Some Questions About Hyperparameter Optimization. We Have A Setup Where We Use Pytorchlightning Cli With Clearml For Experiment Tracking And Hyperparameter Optimization. Now, All Our Configurations Are Config-File Based. Sometime We Have Linke

Hi again CostlyOstrich36 ,

I just wanted to share what ended up working for me. Basically I worked it out both for Hydra (thanks CurvedHedgehog15 ) and for PytorchLightningCLI.

So, for PL-CLI, I used this construct so we don't have to modify our training scripts based on our experiment tracker

` from pytorch_lightning.utilities.cli import LightningCLI
from clearml import Task

class MyCLI(LightningCLI):
def before_instantiate_classes(self) -> None:
# init the task
tas...

2 years ago
0 Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

The server will never access the storage - only the clients (SDK/WebApp etc.) will access it

Oh okay. So that's the reason I can access media when the client and server is running on the same machine?

2 years ago
0 Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

It's actually complementary - the SDK will use the clearml.conf configuration by matching that configuration with the destination you provided

Would you recommend doing both then? :-)

2 years ago
0 Hi There, Our. Self-Hosted Server Is Periodically Very Slow To React In The Web Ui. We'Ve Been Debugging For Quite Some Time, And It Would Seem That Elastisearch Might Be The Culprit. Looking At The Elastisearch Index, We Have An Index Of Around 80G Of Tr

Hi @<1523701070390366208:profile|CostlyOstrich36>
Is 87G a lot for an index? Enough that you would consider adding more RAM?

And also, how can I check that we are not storing scalars for deleted tasks? ClearML used to write a lot of errors in the cleanup script, although that seems to have been fixed in recent updates

one month ago
0 Hi, I Have Some Questions About Hyperparameter Optimization. We Have A Setup Where We Use Pytorchlightning Cli With Clearml For Experiment Tracking And Hyperparameter Optimization. Now, All Our Configurations Are Config-File Based. Sometime We Have Linke

Hi CurvedHedgehog15 , thanks for replying!
I guess that one could modify the config with variable interpolation (similar to how it's done in YAML, e.g. ${encoder.layers} ) - however, it seems to be quite invasive to specify that in our trainer script 😞

2 years ago
0 Hi, I Have Some Questions About Hyperparameter Optimization. We Have A Setup Where We Use Pytorchlightning Cli With Clearml For Experiment Tracking And Hyperparameter Optimization. Now, All Our Configurations Are Config-File Based. Sometime We Have Linke

Hi CostlyOstrich36
What I'm seeing is expected behavior:

In my toy example, I have a VAE which is defined by a YAML config file and parsed with PytorchLightning CLI. Part of the config defines the latent dimension (n_latents) and the number of input channels of the decoder (in_channels). These two values needs to be the same. When I just use the Lightning CLI, I can use variable interpolation with OmegaConf like this:
` class_path: mymodel.VAE
init_args:
{...}
bottleneck:
class_pat...

2 years ago
0 Hi, I'M Using

Hi @<1523701070390366208:profile|CostlyOstrich36> , the task is being aborted via the web UI - I have another method that catches local interrupts (exceptions like keyboard interrupts and crashes). The case is equal for running tasks via agents or just local cli

one month ago
0 Hey, We'Re Seeing A Lot Of Issues With Our Clearml Self-Hosted Server These Days; It Seems Like The Api Times Out While Talking To Elasticsearch:

We are running the latest version (WebApp: 1.7.0-232 β€’ Server: 1.7.0-232 β€’ API: 2.21).
When I run docker logs clearml-elastic I get lots logs like this one:
{"type": "server", "timestamp": "2022-10-24T08:51:35,003Z", "level": "INFO", "component": "o.e.i.g.DatabaseNodeService", "cluster.name": "clearml", "node
.name": "clearml", "message": "successfully reloaded changed geoip database file [/tmp/elasticsearch-3596639242536548410/geoip-databases/cX7aMqJ4SwCxqM7s
YM-S9Q/GeoLite2-City.mmdb]...

2 years ago
0 Rolling Back To 1.15.0 Seemed To Fix The Error For Now. Is There Something One Should Be Aware Of Between Server Versions 1.15 And 1.16 Related To Versions Of The

Sure. I'll give it a few minor releases and then try again πŸ™‚ Thanks for the responses @<1722061389024989184:profile|ResponsiveKoala38> !

one month ago
0 Hi, I'M Using

This is an example of the console output of a task aborted via the webUI:

Epoch 1/29 ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 699/16945 0:04:53 β€’ 1:55:25 2.35it/s v_num: 0.000
2024-09-16 12:52:57,263 - clearml.Task - WARNING - ### TASK STOPPING - USER ABORTED - LAUNCHING CALLBACK (timeout 30.0 sec) ###
[2024-09-16 12:52:57,284][core.callbacks.model_checkpoint][INFO] - Marking task as `in_progress`
[2024-09-16 12:52:57,309][core.callbacks.model_checkpoint][INFO] - Saving last checkpoint...
one month ago
Show more results compactanswers