Hey, So We Noticed The

Answered

Hey, so we noticed the /var/lib/docker/overlay2 directory on the clearml-server is growing a lot in size, we added more disk space but we want to put something in place to stop this growing too much.
These are the options I’ve looked into:

docker system prune - removes all stopped containers, all networks not used by at least one container, all dangling images, all dangling build cache, Problem: we don’t really know what this is pruning
docker image prune --all - removes all images without at least one container associated to them
Set the max-size in docker-compose.yaml for logging

Are the first 2 options safe to run without killing the server? I’m not happy on removing files without knowing what they are.
Are there any plans to automate this in the future?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

Votes Newest

Answers 43

Morning, we got to 100% used which is what triggered this investigation. When we initially looked at overlay2 it was using 8GB, so now seems to be acceptable.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

🤔 i'll add the logging max_size now and monitor over the next week

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

hhrrmm.. in the initial problem, you mentioned that the /var/lib/docker/overlay2 was growing large in size.. but.. 4GB seems "fine" for docker images.. I wonder .. does your nvme0n1p1 ever report like 85% or 90% used or do you think that the 4GB is a lot ? when you restart the server, does the % used noticeably drop ? that would suggest tmp files inside the docker image itself which.. is possible with docker (weird but, possible)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AlertBlackbird30
				
					0
					 × 1

not yet, going to try and fix it today.

if I do a df I see this, which is concerning:

Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G  928K  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/nvme0n1p1   20G  7.9G   13G  40% /
tmpfs           790M     0  790M   0% /run/user/1000

so it looks like the mount points are not created. When do these get created? I thought using an AMI these would have already been setup?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

so am I right in thinking it's just the mount points that are missing?based on the output of df above

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

also, is there a list anywhere with the mount points that are needed?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

It looks like not all the containers are up... Try sudo docker ps and see if the apiserver container is restarting...

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Good point @<1523715084633772032:profile|AlertBlackbird30> 👍

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I believe you can set it on a 'per container' way as well.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AlertBlackbird30
				
					0
					 × 1

so yes indeedly ..

sudo find /var/lib/ -type d -exec du -s -x -h {} \; | grep G | more

seems to give saner results.. of course, in your case, you may also want to grep M for megabyte

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AlertBlackbird30
				
					0
					 × 1

it looks like clearml-apiserver and clearml-fileserver are continually restarting

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

back up and running again, thanks for your help

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

Show more results

Write your answer

112K Views

43 Answers

4 years ago

one year ago