Reputation
Badges 1
8 × Eureka!Ok - I've now tried with 8 workers instead of 4 and its the same. I should note that the apiserver container CPU usage is pretty low (~5-10% ). Also memory-wise it looks pretty in-spec to me. Below is a typical docker stats output when the server is behaving pretty sluggish
` CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
5e9160ba93d7 clearml-webserver 0.00% 5.996MiB / ...
hi,
we're liking all the small the improvements to the ui a lot (like the scrollbar, new controls/shortcuts for the scalar plots). And it was pretty good when it was still trains ;)
I expected it wouldn't be that easy. Thank you for explaining. It would be useful to have a unified python interface for the config parameters.
Perfect. That seems to work. So set_parameters_as_dict does the merge while set_parameters basically overrides, correct?
to add: following the suggestion from another thread, I looked at the developer tools -> network output and this is the failing requrest (I think)endpoint: {name: "tasks.get_all_ex", requested_version: "2.12", actual_version: "1.0"} error_data: {} error_stack: null id: "8be33097ac824ef2bc40dded2bfc5fe8" result_code: 500 result_msg: "Internal server error: err=Cannot resolve field "null", extra_info=None" result_subcode: 1 trx: "8be33097ac824ef2bc40dded2bfc5fe8"
And also this
` endpoint: ...
Hi Alon,
yes exactly. Or override some parameter, e.g. a nested dictionary
Would it help any further diagnotics if I upload the clearml-* (e.g. apiserver or mongo) logs? SuccessfulKoala55 AgitatedDove14
Thank you AgitatedDove14 I'm trying it now but I think it works. Effectively it would be convenient for us if all the .conf parameters could be also set programmatically when initialising the Task from python.
What's the size of the mongo DB?
/opt/clearml/data/mongo/* has about 930M (if that's the right way of checking the size)
what we observe is just general UI un-responsiveness. For example, opening a project or experiment page might take half a minute.
I should add: it seems to get worse when more workers are registered and more experiments are queued
Hi AgitatedDove14 and SuccessfulKoala55 I just had a look at the machine stats. Max CPU usage is ~30% (of all the 4 cores). Average is more like 10% over a day or so. By spawning multiple processes for the API server, it looks like we utilise the CPU more now but the UI and API calls are still lagging a lot
If you'd like, you can DM them
Thanks. I've sent them to you via DM.