And you deleted a single experiment? Or many?
Okay, thank you for the suggestions, we'll try it out
Hi RotundHedgehog76 ,
Where exactly do you see errors?
Any error in the apiserver log? (sudo docker logs clearml-apiserver)
This was actually a reset (of a one experiment) not a delete
Yes, that's right. We deployed it on a GCP instance
Hello, a similar thing happened today. In the developer's console there was this line
https://server/api/v2.19/tasks.reset_many 504 (Gateway time-out)
For now, docker compose down && docker compose up -d
helps
How large are your ES indices? Maybe this is ES being inefficient?
No errors in logs, but that's because I restarted the deployment :(
Nothing at all. There are only 2 logs from this day, and all were at 2am
how much memory do you have assigned to ES?
I guess I'll let you know the next time this happens haha
So this seems to be a purely load issue - can you remind me what deployment type you are using? docker-compose, right?
So currently it's -Xms2g -Xmx2g
which means 2GB
I haven't looked, I'll let you know next time it happens
Anything you can see in the browser's JS console or in the Developer Tools Network section?
I would suggest (assuming the machine has enough RAM memory) to set it to at least -Xms4g -Xmx4g
and maybe more. You'll need at least twice than that free for ES alone (so make sure your machine has at least 16GB RAM)
we didn't change a thing from the defaults that's in your github 😄 so it's 500M?
Can you try to get the ES log using docker logs clearml-elastic
?
Errors pop in occasionally in the Web UI. All we see is a dialog with the text "Error"
it's in the default env vars for elasticsearch in the docker compose
i think you're right, the default elastic values do not seem to work for us