After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen
It looks like not all the containers are up... Try sudo docker ps and see if the apiserver container is restarting...
also, is there a list anywhere with the mount points that are needed?
think I found the issue, a typo in apiserver.conf
back up and running again, thanks for your help
In the publicly available AMI these are created. However, if you used a previously released Trains AMI and upgraded to ClearML, part of the upgrade process was to create those directories (required by the new docker-compose.yml ), as explained here: None
I added this to each of the containers
logging:
options:
max-file: 5
max-size: 10m
it looks like clearml-apiserver and clearml-fileserver are continually restarting
🤔 i'll add the logging max_size now and monitor over the next week
Howdy and Morning @<1687643893996195840:profile|RoundCat60> .. docker when using overlay2 doesn't have it's mount points show up in a 'df' btw, they will only appear in a 'df -a', mostly because since they are simply 'overlays', they don't (technically) consume any space (I mean, the files are still in the /var/lib but not for the space counting practices used by df)
this is why I was suggesting a find, maybe with a 'du' .. actually.. let me try that here.. 2s
Not necessarily, is there any data in those directories?
hhrrmm.. in the initial problem, you mentioned that the /var/lib/docker/overlay2 was growing large in size.. but.. 4GB seems "fine" for docker images.. I wonder .. does your nvme0n1p1 ever report like 85% or 90% used or do you think that the 4GB is a lot ? when you restart the server, does the % used noticeably drop ? that would suggest tmp files inside the docker image itself which.. is possible with docker (weird but, possible)
container_name:
logging:
options:
max-size: 10m
Oh, that's strange. I'll run one of those soon to see if there's anything wrong with them
I believe you can set it on a 'per container' way as well.
hey @<1687643893996195840:profile|RoundCat60> .. did you ever get the problem sorted ?
yeah, that's usually the case when you get an empty dashboard
Can you perhaps attach your docker-compose.yml file's contents?
incidentally we turn off the server every evening as it's not used overnight, we've not faced issues with it starting up in the morning or noticed any data loss
Hey there waves
Not sure about plans to automate this in the future, as this is more how docker behaves and not really clearml, especially with the overlay2 filesystem. The biggest offender usually is your json logfiles. have a look in /var/lib/docker/containers/ for *.log
assuming this IS the case, you can tell docker to only log upto a max-size .. I have mine set to 100m or some such
yep, in most of them:
/opt/clearml/config
apiserver.conf
clearml.conf
/opt/clearml/data/elastic_7
/nodes
/opt/clearml/data/fileserver
<empty>
/opt/clearml/data/mongo/configdb
<empty>
/opt/clearml/data/mongo/db
collection/index files, /diagnostic.data, /journal etc
/opt/clearml/data/redis
dump.rdb
/opt/clearml/logs
apiserver.log.x, filserver.log (0 bytes)
btw - if you remove the docker-compose changes, do the containers start normally?
we turn off the server every evening...
In that case the issue is definitely not related to the mount points
so yes indeedly ..
sudo find /var/lib/ -type d -exec du -s -x -h {} \; | grep G | more
seems to give saner results.. of course, in your case, you may also want to grep M for megabyte
Check sudo docker logs <container-name>
Morning, we got to 100% used which is what triggered this investigation. When we initially looked at overlay2 it was using 8GB, so now seems to be acceptable.
not entirely sure on this as we used the custom AMI solution, is there any documentation on it?
Good point @<1523715084633772032:profile|AlertBlackbird30> 👍
@<1687643893996195840:profile|RoundCat60> can you verify all the volume mounts point to existing directories on the server machine? (i.e. /opt/clearml/... )