I Have A Problem That Might Not Directly Be Clearml Related, But Maybe Someone Here Has An Idea:
I Run A Clearml-Server On A Machine With 128Gb Ram, 32 Cores And 2 Gpus.
On The Same Machine I Run 2 Clearml-Agent Each With Access To 1 Gpu, 12 Cores, An 48G
CostlyOstrich36 Actually no container exits, so I guess if it s because of OOM like SuccessfulKoala55 implies, than maybe a process inside the container gets killed and the container will hang? Is this possible?
SuccessfulKoala55 I did not observe elastic to use much RAM (at least right after starting). Doesn't this line in the docker-compose control the RAM usage?ES_JAVA_OPTS: -Xms2g -Xmx2g -Dlog4j2.formatMsgNoLookups=true
