DisturbedWalrus17

4 Questions, 14 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

8 × Eureka!

Questions 4
Answers 14

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, I'M Running The Latest Clearml Server On Aws For A Week Now And I Regularily Run Into The "Fetch Experiments Failed". I Can'T Really Find Any More Information What Went Wrong? Any Help To Diagnose The Problem Further Would Be Appreciated

Hi, I'm running the latest clearml server on aws for a week now and I regularily run into the "Fetch Experiments failed". I can't really find any more inform...

aws

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hello, I'M Really Enjoying The Clearml Experience And We'Re Using It Very Successfully At Work. I Have A Small Question Though: I'M Trying To Set The "File_History_Size" Sdk Parameter From Python Code Instead Of The Conf File. Is That Possible? Thanks!

Hello, I'm really enjoying the ClearML experience and we're using it very successfully at work. I have a small question though: I'm trying to set the "file_h...

clearml

4 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi, Is It Possible To Delete A Parameter From A Task After It Is Connected? So Far I'Ve Tried Using Connect(...) And Set_Parameters_As_Dict() Method, But They Appear To Merge But Never Remove Parameters. Is There Another Way To Reset/Remove Parameters?

Hi, Is it possible to delete a parameter from a Task after it is connected? So far I've tried using connect(...) and set_parameters_as_dict() method, but the...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is There A Way To Add More High-Level Structure To The Hyperparameter Display In The Config Tab? I Only Have "Args" And "General" As A Default But Can I Add Something Else To The List?

Hi, is there a way to add more high-level structure to the HyperParameter display in the config tab? I only have "Args" and "General" as a default but can I ...

clearml

4 years ago

0 Hello, I'M Really Enjoying The Clearml Experience And We'Re Using It Very Successfully At Work. I Have A Small Question Though: I'M Trying To Set The "File_History_Size" Sdk Parameter From Python Code Instead Of The Conf File. Is That Possible? Thanks!

I expected it wouldn't be that easy. Thank you for explaining. It would be useful to have a unified python interface for the config parameters.

4 years ago

0 We Are Facing Performance Issues Of Our Self-Hosted Clearml Server Looking At The Cpu Utilization \ Memory \ Networking We Couldn'T Identify A Bottleneck We Are At The Moment Using ~100 Workers For Some Hpo, And The Main Performance Issues We Observe Are

Would it help any further diagnotics if I upload the clearml-* (e.g. apiserver or mongo) logs? SuccessfulKoala55 AgitatedDove14

4 years ago

Hi AgitatedDove14 and SuccessfulKoala55 I just had a look at the machine stats. Max CPU usage is ~30% (of all the 4 cores). Average is more like 10% over a day or so. By spawning multiple processes for the API server, it looks like we utilise the CPU more now but the UI and API calls are still lagging a lot

4 years ago

0 Hi, Is It Possible To Delete A Parameter From A Task After It Is Connected? So Far I'Ve Tried Using Connect(...) And Set_Parameters_As_Dict() Method, But They Appear To Merge But Never Remove Parameters. Is There Another Way To Reset/Remove Parameters?

Perfect. That seems to work. So set_parameters_as_dict does the merge while set_parameters basically overrides, correct?

4 years ago

0 Hi, I'M Running The Latest Clearml Server On Aws For A Week Now And I Regularily Run Into The "Fetch Experiments Failed". I Can'T Really Find Any More Information What Went Wrong? Any Help To Diagnose The Problem Further Would Be Appreciated

yay - it did. thank you

4 years ago

hi,
we're liking all the small the improvements to the ui a lot (like the scrollbar, new controls/shortcuts for the scalar plots). And it was pretty good when it was still trains ;)

4 years ago

to add: following the suggestion from another thread, I looked at the developer tools -> network output and this is the failing requrest (I think)
endpoint: {name: "tasks.get_all_ex", requested_version: "2.12", actual_version: "1.0"} error_data: {} error_stack: null id: "8be33097ac824ef2bc40dded2bfc5fe8" result_code: 500 result_msg: "Internal server error: err=Cannot resolve field "null", extra_info=None" result_subcode: 1 trx: "8be33097ac824ef2bc40dded2bfc5fe8"And also this

` endpoint: ...

4 years ago

Thank you AgitatedDove14 I'm trying it now but I think it works. Effectively it would be convenient for us if all the .conf parameters could be also set programmatically when initialising the Task from python.

4 years ago

Hi Alon,
yes exactly. Or override some parameter, e.g. a nested dictionary

4 years ago

What's the size of the mongo DB?

/opt/clearml/data/mongo/* has about 930M (if that's the right way of checking the size)

4 years ago

If you'd like, you can DM them

Thanks. I've sent them to you via DM.

4 years ago

Ok - I've now tried with 8 workers instead of 4 and its the same. I should note that the apiserver container CPU usage is pretty low (~5-10% ). Also memory-wise it looks pretty in-spec to me. Below is a typical docker stats output when the server is behaving pretty sluggish
` CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
5e9160ba93d7 clearml-webserver 0.00% 5.996MiB / ...

4 years ago

what we observe is just general UI un-responsiveness. For example, opening a project or experiment page might take half a minute.

I should add: it seems to get worse when more workers are registered and more experiments are queued

4 years ago

0 Hi, Is There A Way To Add More High-Level Structure To The Hyperparameter Display In The Config Tab? I Only Have "Args" And "General" As A Default But Can I Add Something Else To The List?

that's exactly it. thank you

4 years ago