Reputation
Badges 1
8 × Eureka!Ok - I've now tried with 8 workers instead of 4 and its the same. I should note that the apiserver container CPU usage is pretty low (~5-10% ). Also memory-wise it looks pretty in-spec to me. Below is a typical docker stats output when the server is behaving pretty sluggish
` CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
5e9160ba93d7 clearml-webserver 0.00% 5.996MiB / ...
Hi Alon,
yes exactly. Or override some parameter, e.g. a nested dictionary
hi,
we're liking all the small the improvements to the ui a lot (like the scrollbar, new controls/shortcuts for the scalar plots). And it was pretty good when it was still trains ;)
what we observe is just general UI un-responsiveness. For example, opening a project or experiment page might take half a minute.
I should add: it seems to get worse when more workers are registered and more experiments are queued
Hi AgitatedDove14 and SuccessfulKoala55 I just had a look at the machine stats. Max CPU usage is ~30% (of all the 4 cores). Average is more like 10% over a day or so. By spawning multiple processes for the API server, it looks like we utilise the CPU more now but the UI and API calls are still lagging a lot
Would it help any further diagnotics if I upload the clearml-* (e.g. apiserver or mongo) logs? SuccessfulKoala55 AgitatedDove14
If you'd like, you can DM them
Thanks. I've sent them to you via DM.
to add: following the suggestion from another thread, I looked at the developer tools -> network output and this is the failing requrest (I think)endpoint: {name: "tasks.get_all_ex", requested_version: "2.12", actual_version: "1.0"} error_data: {} error_stack: null id: "8be33097ac824ef2bc40dded2bfc5fe8" result_code: 500 result_msg: "Internal server error: err=Cannot resolve field "null", extra_info=None" result_subcode: 1 trx: "8be33097ac824ef2bc40dded2bfc5fe8"
And also this
` endpoint: ...
Perfect. That seems to work. So set_parameters_as_dict does the merge while set_parameters basically overrides, correct?
I expected it wouldn't be that easy. Thank you for explaining. It would be useful to have a unified python interface for the config parameters.
What's the size of the mongo DB?
/opt/clearml/data/mongo/* has about 930M (if that's the right way of checking the size)
Thank you AgitatedDove14 I'm trying it now but I think it works. Effectively it would be convenient for us if all the .conf parameters could be also set programmatically when initialising the Task from python.