Check sudo docker logs <container-name>
it looks like clearml-apiserver
and clearml-fileserver
are continually restarting
no, they are still rebooting. i've looked in /opt/clearml/logs/apiserver.log
no errors
also, is there a list anywhere with the mount points that are needed?
yeah, that's usually the case when you get an empty dashboard
container_name:
logging:
options:
max-size: 10m
Basically whatever was under the old /opt/trains/
folder is required, you can see the list here: None
btw - if you remove the docker-compose changes, do the containers start normally?
incidentally we turn off the server every evening as it's not used overnight, we've not faced issues with it starting up in the morning or noticed any data loss
back up and running again, thanks for your help
Hey there waves
Not sure about plans to automate this in the future, as this is more how docker behaves and not really clearml, especially with the overlay2 filesystem. The biggest offender usually is your json logfiles. have a look in /var/lib/docker/containers/ for *.log
assuming this IS the case, you can tell docker to only log upto a max-size .. I have mine set to 100m or some such
@<1687643893996195840:profile|RoundCat60> can you verify all the volume mounts point to existing directories on the server machine? (i.e. /opt/clearml/...
)
thanks Stef, with max-size
do you set it for every running service separately, or can you set it once?
not entirely sure on this as we used the custom AMI solution, is there any documentation on it?
think I found the issue, a typo in apiserver.conf
thanks @<1523715084633772032:profile|AlertBlackbird30> this is really informative. Nothing seems to be particularly out of the ordinary though
3.7G /var/lib/
3.7G /var/lib/docker
3.0G /var/lib/docker/overlay2
followed by a whole load of files that are a few hundred KBs in size, nothing huge though
I believe you can set it on a 'per container' way as well.
hhrrmm.. in the initial problem, you mentioned that the /var/lib/docker/overlay2 was growing large in size.. but.. 4GB seems "fine" for docker images.. I wonder .. does your nvme0n1p1 ever report like 85% or 90% used or do you think that the 4GB is a lot ? when you restart the server, does the % used noticeably drop ? that would suggest tmp files inside the docker image itself which.. is possible with docker (weird but, possible)
we turn off the server every evening...
In that case the issue is definitely not related to the mount points
๐ค i'll add the logging max_size now and monitor over the next week
hey @<1687643893996195840:profile|RoundCat60> .. did you ever get the problem sorted ?
so am I right in thinking it's just the mount points that are missing?based on the output of df
above
After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen
Morning, we got to 100% used which is what triggered this investigation. When we initially looked at overlay2 it was using 8GB, so now seems to be acceptable.
yep, in most of them:
/opt/clearml/config
apiserver.conf
clearml.conf
/opt/clearml/data/elastic_7
/nodes
/opt/clearml/data/fileserver
<empty>
/opt/clearml/data/mongo/configdb
<empty>
/opt/clearml/data/mongo/db
collection/index files, /diagnostic.data, /journal etc
/opt/clearml/data/redis
dump.rdb
/opt/clearml/logs
apiserver.log.x, filserver.log (0 bytes)
Good point @<1523715084633772032:profile|AlertBlackbird30> ๐
you will probably want to find the culprit, so a find should work wonders. I probably suspect elasticsearch first. It tends to go nuts ๐
@<1687643893996195840:profile|RoundCat60> you set it once, inside the docker-compose itself.. it will affect all docker containers but, to be honest, docker tends to log everything