we didn't change a thing from the defaults that's in your github 😄 so it's 500M?
i think you're right, the default elastic values do not seem to work for us
No errors in logs, but that's because I restarted the deployment :(
How large are your ES indices? Maybe this is ES being inefficient?
it's in the default env vars for elasticsearch in the docker compose
I guess I'll let you know the next time this happens haha
how much memory do you have assigned to ES?
So currently it's -Xms2g -Xmx2g
which means 2GB
I would suggest (assuming the machine has enough RAM memory) to set it to at least -Xms4g -Xmx4g
and maybe more. You'll need at least twice than that free for ES alone (so make sure your machine has at least 16GB RAM)
Okay, thank you for the suggestions, we'll try it out
Can you send what you have there now?
Anything you can see in the browser's JS console or in the Developer Tools Network section?
Errors pop in occasionally in the Web UI. All we see is a dialog with the text "Error"
I haven't looked, I'll let you know next time it happens
For now, docker compose down && docker compose up -d
helps
Hi RotundHedgehog76 ,
Where exactly do you see errors?
Hello, a similar thing happened today. In the developer's console there was this line
https://server/api/v2.19/tasks.reset_many 504 (Gateway time-out)
So this seems to be a purely load issue - can you remind me what deployment type you are using? docker-compose, right?
Nothing at all. There are only 2 logs from this day, and all were at 2am
Can you try to get the ES log using docker logs clearml-elastic
?
Any error in the apiserver log? (sudo docker logs clearml-apiserver)
This was actually a reset (of a one experiment) not a delete
Yes, that's right. We deployed it on a GCP instance
And you deleted a single experiment? Or many?