btw - if you remove the docker-compose changes, do the containers start normally?
no, they are still rebooting. i've looked in /opt/clearml/logs/apiserver.log
no errors
we turn off the server every evening...
In that case the issue is definitely not related to the mount points
RoundCat60 you set it once, inside the docker-compose itself.. it will affect all docker containers but, to be honest, docker tends to log everything
it looks like clearml-apiserver
and clearml-fileserver
are continually restarting
think I found the issue, a typo in apiserver.conf
so yes indeedly ..
sudo find /var/lib/ -type d -exec du -s -x -h {} \; | grep G | more
seems to give saner results.. of course, in your case, you may also want to grep M for megabyte
thanks AlertBlackbird30 this is really informative. Nothing seems to be particularly out of the ordinary though
3.7G /var/lib/
3.7G /var/lib/docker
3.0G /var/lib/docker/overlay2
followed by a whole load of files that are a few hundred KBs in size, nothing huge though
Not necessarily, is there any data in those directories?
yep, in most of them:
/opt/clearml/config
apiserver.conf
clearml.conf
/opt/clearml/data/elastic_7
/nodes
/opt/clearml/data/fileserver
<empty>
/opt/clearml/data/mongo/configdb
<empty>
/opt/clearml/data/mongo/db
collection/index files, /diagnostic.data, /journal etc
/opt/clearml/data/redis
dump.rdb
/opt/clearml/logs
apiserver.log.x, filserver.log (0 bytes)
thanks Stef, with max-size
do you set it for every running service separately, or can you set it once?
so am I right in thinking it's just the mount points that are missing?based on the output of df
above
Oh, that's strange. I'll run one of those soon to see if there's anything wrong with them
incidentally we turn off the server every evening as it's not used overnight, we've not faced issues with it starting up in the morning or noticed any data loss
Hey there waves
Not sure about plans to automate this in the future, as this is more how docker behaves and not really clearml, especially with the overlay2 filesystem. The biggest offender usually is your json logfiles. have a look in /var/lib/docker/containers/ for *.log
assuming this IS the case, you can tell docker to only log upto a max-size .. I have mine set to 100m or some such
Can you perhaps attach your docker-compose.yml
file's contents?
back up and running again, thanks for your help
RoundCat60 can you verify all the volume mounts point to existing directories on the server machine? (i.e. /opt/clearml/...
)
I added this to each of the containers
logging:
options:
max-file: 5
max-size: 10m
container_name:
logging:
options:
max-size: 10m
yeah, that's usually the case when you get an empty dashboard
Howdy and Morning RoundCat60 .. docker when using overlay2 doesn't have it's mount points show up in a 'df' btw, they will only appear in a 'df -a', mostly because since they are simply 'overlays', they don't (technically) consume any space (I mean, the files are still in the /var/lib but not for the space counting practices used by df)
this is why I was suggesting a find, maybe with a 'du' .. actually.. let me try that here.. 2s
not yet, going to try and fix it today.
if I do a df
I see this, which is concerning:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 928K 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/nvme0n1p1 20G 7.9G 13G 40% /
tmpfs 790M 0 790M 0% /run/user/1000
so it looks like the mount points are not created. When do these get created? I thought using an AMI these would have already been setup?
I think that if these directories are not mounted, you should first of all take care not to shut down the server. You'll probably want to exec /bin/bash
into the mongo
and elastic
containers, and copy their data outside to the host storage
you will probably want to find the culprit, so a find should work wonders. I probably suspect elasticsearch first. It tends to go nuts 😕
... from the AMI creation script:
# prepare directories to store data
sudo mkdir -p /opt/clearml/data/elastic_7
sudo mkdir -p /opt/clearml/data/redis
sudo mkdir -p /opt/clearml/data/mongo/db
sudo mkdir -p /opt/clearml/data/mongo/configdb
sudo mkdir -p /opt/clearml/logs
sudo mkdir -p /opt/clearml/config
sudo mkdir -p /opt/clearml/data/fileserver
sudo chown -R 1000:1000 /opt/clearml/data/elastic_7
So it seems the AMI is using the correct directories... Do you have these?
Check sudo docker logs <container-name>