How large are your ES indices? Maybe this is ES being inefficient?
I guess I'll let you know the next time this happens haha
how much memory do you have assigned to ES?
i think you're right, the default elastic values do not seem to work for us
Okay, thank you for the suggestions, we'll try it out
Hi RotundHedgehog76 ,
Where exactly do you see errors?
Nothing at all. There are only 2 logs from this day, and all were at 2am
Yes, that's right. We deployed it on a GCP instance
This was actually a reset (of a one experiment) not a delete
Errors pop in occasionally in the Web UI. All we see is a dialog with the text "Error"
I haven't looked, I'll let you know next time it happens
For now, docker compose down && docker compose up -d
helps
Hello, a similar thing happened today. In the developer's console there was this line
https://server/api/v2.19/tasks.reset_many 504 (Gateway time-out)
Can you try to get the ES log using docker logs clearml-elastic
?
I would suggest (assuming the machine has enough RAM memory) to set it to at least -Xms4g -Xmx4g
and maybe more. You'll need at least twice than that free for ES alone (so make sure your machine has at least 16GB RAM)
we didn't change a thing from the defaults that's in your github 😄 so it's 500M?
So this seems to be a purely load issue - can you remind me what deployment type you are using? docker-compose, right?
So currently it's -Xms2g -Xmx2g
which means 2GB
Anything you can see in the browser's JS console or in the Developer Tools Network section?
Any error in the apiserver log? (sudo docker logs clearml-apiserver)
it's in the default env vars for elasticsearch in the docker compose
Can you send what you have there now?
No errors in logs, but that's because I restarted the deployment :(
And you deleted a single experiment? Or many?