hi, were running clearml webserver (version on the ui says: 1.1.1-135 • 1.1.1 • 2.14) on a dedicated ubuntu 20.04 machine with 15g ram and 8 cores. Lately we experience slowdowns and sometimes even stretches of 0.5-1 hour in which the webui does not respond at all. While monitoring the logs and the vitals of the machines there is nothing in the webai logs, but the cpu and ram usages are ~100% and the culprit seems to be the mongo docker. It spawns many processes that increase mongos ram size like so :
mongod --setParameter internalQueryExecMaxBlockingSortBytes=196100200 --bind_ip_all
I tried reading about troubleshooting this kind of issues in mongo, but before sabotaging all of my teammate's work i would be glad if anyone had a similar experience, or knows what needs to be done.