Hey, So We Noticed The

Answered

Hey, so we noticed the /var/lib/docker/overlay2 directory on the clearml-server is growing a lot in size, we added more disk space but we want to put something in place to stop this growing too much.
These are the options I’ve looked into:

docker system prune - removes all stopped containers, all networks not used by at least one container, all dangling images, all dangling build cache, Problem: we don’t really know what this is pruning
docker image prune --all - removes all images without at least one container associated to them
Set the max-size in docker-compose.yaml for logging

Are the first 2 options safe to run without killing the server? I’m not happy on removing files without knowing what they are.
Are there any plans to automate this in the future?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

Votes Newest

Answers 43

Basically whatever was under the old /opt/trains/ folder is required, you can see the list here: None

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Howdy and Morning @<1687643893996195840:profile|RoundCat60> .. docker when using overlay2 doesn't have it's mount points show up in a 'df' btw, they will only appear in a 'df -a', mostly because since they are simply 'overlays', they don't (technically) consume any space (I mean, the files are still in the /var/lib but not for the space counting practices used by df)

this is why I was suggesting a find, maybe with a 'du' .. actually.. let me try that here.. 2s

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AlertBlackbird30
				
					0
					 × 1

yeah, that's usually the case when you get an empty dashboard

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

thanks @<1523715084633772032:profile|AlertBlackbird30> this is really informative. Nothing seems to be particularly out of the ordinary though

3.7G	/var/lib/
3.7G	/var/lib/docker
3.0G	/var/lib/docker/overlay2

followed by a whole load of files that are a few hundred KBs in size, nothing huge though

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

not yet, going to try and fix it today.

if I do a df I see this, which is concerning:

Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G  928K  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/nvme0n1p1   20G  7.9G   13G  40% /
tmpfs           790M     0  790M   0% /run/user/1000

so it looks like the mount points are not created. When do these get created? I thought using an AMI these would have already been setup?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

think I found the issue, a typo in apiserver.conf

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

I believe you can set it on a 'per container' way as well.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AlertBlackbird30
				
					0
					 × 1

you will probably want to find the culprit, so a find should work wonders. I probably suspect elasticsearch first. It tends to go nuts 😕

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AlertBlackbird30
				
					0
					 × 1

not entirely sure on this as we used the custom AMI solution, is there any documentation on it?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

so am I right in thinking it's just the mount points that are missing?based on the output of df above

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

btw - if you remove the docker-compose changes, do the containers start normally?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

back up and running again, thanks for your help

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

we turn off the server every evening...

In that case the issue is definitely not related to the mount points

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

hey @<1687643893996195840:profile|RoundCat60> .. did you ever get the problem sorted ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AlertBlackbird30
				
					0
					 × 1

also, is there a list anywhere with the mount points that are needed?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

Not necessarily, is there any data in those directories?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi @<1687643893996195840:profile|RoundCat60> ,
We've actually never had to address this issue. Can you find out what exactly is growing in size? I'd like to make sure this is not due to the containers storing data internally (causing docker to store more and more snapshots) - this is an unhealthy situation that might also indicate that volumes are not mounted correctly (i.e. data that should be stored externally is actually stored internally)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

strange, I used one of the publicly available AMIs for ClearML (we did not upgrade from the Trains AMI as started fresh)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

I added this to each of the containers

    logging:
      options:
	    max-file: 5
        max-size: 10m

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

I think that if these directories are not mounted, you should first of all take care not to shut down the server. You'll probably want to exec /bin/bash into the mongo and elastic containers, and copy their data outside to the host storage

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

thanks Stef, with max-size do you set it for every running service separately, or can you set it once?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

... from the AMI creation script:

# prepare directories to store data
sudo mkdir -p /opt/clearml/data/elastic_7
sudo mkdir -p /opt/clearml/data/redis
sudo mkdir -p /opt/clearml/data/mongo/db
sudo mkdir -p /opt/clearml/data/mongo/configdb
sudo mkdir -p /opt/clearml/logs
sudo mkdir -p /opt/clearml/config
sudo mkdir -p /opt/clearml/data/fileserver
sudo chown -R 1000:1000 /opt/clearml/data/elastic_7

So it seems the AMI is using the correct directories... Do you have these?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Check sudo docker logs <container-name>

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

so yes indeedly ..

sudo find /var/lib/ -type d -exec du -s -x -h {} \; | grep G | more

seems to give saner results.. of course, in your case, you may also want to grep M for megabyte

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AlertBlackbird30
				
					0
					 × 1

Morning, we got to 100% used which is what triggered this investigation. When we initially looked at overlay2 it was using 8GB, so now seems to be acceptable.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

It looks like not all the containers are up... Try sudo docker ps and see if the apiserver container is restarting...

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

yep, in most of them:

/opt/clearml/config
apiserver.conf
clearml.conf

/opt/clearml/data/elastic_7
/nodes

/opt/clearml/data/fileserver
<empty>

/opt/clearml/data/mongo/configdb
<empty>

/opt/clearml/data/mongo/db
collection/index files, /diagnostic.data, /journal etc

/opt/clearml/data/redis 
dump.rdb

/opt/clearml/logs
apiserver.log.x, filserver.log (0 bytes)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RoundCat60
				
					0
					 × 1

Can you perhaps attach your docker-compose.yml file's contents?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Good point @<1523715084633772032:profile|AlertBlackbird30> 👍

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Show more results

Write your answer

134K Views

43 Answers

4 years ago

one year ago