Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, So We Noticed The

Hey, so we noticed the /var/lib/docker/overlay2 directory on the clearml-server is growing a lot in size, we added more disk space but we want to put something in place to stop this growing too much.
These are the options Iโ€™ve looked into:

  1. docker system prune - removes all stopped containers, all networks not used by at least one container, all dangling images, all dangling build cache, Problem: we donโ€™t really know what this is pruning
  2. docker image prune --all - removes all images without at least one container associated to them
  3. Set the max-size in docker-compose.yaml for logging

Are the first 2 options safe to run without killing the server? Iโ€™m not happy on removing files without knowing what they are.
Are there any plans to automate this in the future?

  
  
Posted 3 years ago
Votes Newest

Answers 43


Can you perhaps attach your docker-compose.yml file's contents?

  
  
Posted 3 years ago

not yet, going to try and fix it today.

if I do a df I see this, which is concerning:

Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G  928K  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/nvme0n1p1   20G  7.9G   13G  40% /
tmpfs           790M     0  790M   0% /run/user/1000

so it looks like the mount points are not created. When do these get created? I thought using an AMI these would have already been setup?

  
  
Posted 3 years ago

you will probably want to find the culprit, so a find should work wonders. I probably suspect elasticsearch first. It tends to go nuts ๐Ÿ˜•

  
  
Posted 3 years ago

also, is there a list anywhere with the mount points that are needed?

  
  
Posted 3 years ago

@<1687643893996195840:profile|RoundCat60> you set it once, inside the docker-compose itself.. it will affect all docker containers but, to be honest, docker tends to log everything

  
  
Posted 3 years ago

Not necessarily, is there any data in those directories?

  
  
Posted 3 years ago

I think that if these directories are not mounted, you should first of all take care not to shut down the server. You'll probably want to exec /bin/bash into the mongo and elastic containers, and copy their data outside to the host storage

  
  
Posted 3 years ago

btw - if you remove the docker-compose changes, do the containers start normally?

  
  
Posted 3 years ago

Good point @<1523715084633772032:profile|AlertBlackbird30> ๐Ÿ‘

  
  
Posted 3 years ago

๐Ÿค” i'll add the logging max_size now and monitor over the next week

  
  
Posted 3 years ago

hey @<1687643893996195840:profile|RoundCat60> .. did you ever get the problem sorted ?

  
  
Posted 3 years ago

think I found the issue, a typo in apiserver.conf

  
  
Posted 3 years ago

Hi @<1687643893996195840:profile|RoundCat60> ,
We've actually never had to address this issue. Can you find out what exactly is growing in size? I'd like to make sure this is not due to the containers storing data internally (causing docker to store more and more snapshots) - this is an unhealthy situation that might also indicate that volumes are not mounted correctly (i.e. data that should be stored externally is actually stored internally)

  
  
Posted 3 years ago

thanks @<1523715084633772032:profile|AlertBlackbird30> this is really informative. Nothing seems to be particularly out of the ordinary though

3.7G	/var/lib/
3.7G	/var/lib/docker
3.0G	/var/lib/docker/overlay2

followed by a whole load of files that are a few hundred KBs in size, nothing huge though

  
  
Posted 3 years ago

Oh, that's strange. I'll run one of those soon to see if there's anything wrong with them

  
  
Posted 3 years ago

Hey there waves

Not sure about plans to automate this in the future, as this is more how docker behaves and not really clearml, especially with the overlay2 filesystem. The biggest offender usually is your json logfiles. have a look in /var/lib/docker/containers/ for *.log

assuming this IS the case, you can tell docker to only log upto a max-size .. I have mine set to 100m or some such

  
  
Posted 3 years ago

@<1687643893996195840:profile|RoundCat60> can you verify all the volume mounts point to existing directories on the server machine? (i.e. /opt/clearml/... )

  
  
Posted 3 years ago

... from the AMI creation script:

# prepare directories to store data
sudo mkdir -p /opt/clearml/data/elastic_7
sudo mkdir -p /opt/clearml/data/redis
sudo mkdir -p /opt/clearml/data/mongo/db
sudo mkdir -p /opt/clearml/data/mongo/configdb
sudo mkdir -p /opt/clearml/logs
sudo mkdir -p /opt/clearml/config
sudo mkdir -p /opt/clearml/data/fileserver
sudo chown -R 1000:1000 /opt/clearml/data/elastic_7

So it seems the AMI is using the correct directories... Do you have these?

  
  
Posted 3 years ago

incidentally we turn off the server every evening as it's not used overnight, we've not faced issues with it starting up in the morning or noticed any data loss

  
  
Posted 3 years ago

I added this to each of the containers

    logging:
      options:
	    max-file: 5
        max-size: 10m
  
  
Posted 3 years ago

yep, in most of them:

/opt/clearml/config
apiserver.conf
clearml.conf

/opt/clearml/data/elastic_7
/nodes

/opt/clearml/data/fileserver
<empty>

/opt/clearml/data/mongo/configdb
<empty>

/opt/clearml/data/mongo/db
collection/index files, /diagnostic.data, /journal etc

/opt/clearml/data/redis 
dump.rdb

/opt/clearml/logs
apiserver.log.x, filserver.log (0 bytes)
  
  
Posted 3 years ago

hhrrmm.. in the initial problem, you mentioned that the /var/lib/docker/overlay2 was growing large in size.. but.. 4GB seems "fine" for docker images.. I wonder .. does your nvme0n1p1 ever report like 85% or 90% used or do you think that the 4GB is a lot ? when you restart the server, does the % used noticeably drop ? that would suggest tmp files inside the docker image itself which.. is possible with docker (weird but, possible)

  
  
Posted 3 years ago

so yes indeedly ..

sudo find /var/lib/ -type d -exec du -s -x -h {} \; | grep G | more

seems to give saner results.. of course, in your case, you may also want to grep M for megabyte

  
  
Posted 3 years ago

After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen
image

  
  
Posted 3 years ago

yes those have all been created

  
  
Posted 3 years ago

Try looking at their logs

  
  
Posted 3 years ago

Howdy and Morning @<1687643893996195840:profile|RoundCat60> .. docker when using overlay2 doesn't have it's mount points show up in a 'df' btw, they will only appear in a 'df -a', mostly because since they are simply 'overlays', they don't (technically) consume any space (I mean, the files are still in the /var/lib but not for the space counting practices used by df)

this is why I was suggesting a find, maybe with a 'du' .. actually.. let me try that here.. 2s

  
  
Posted 3 years ago

In the publicly available AMI these are created. However, if you used a previously released Trains AMI and upgraded to ClearML, part of the upgrade process was to create those directories (required by the new docker-compose.yml ), as explained here: None

  
  
Posted 3 years ago

container_name:
  logging:
    options:
      max-size: 10m
  
  
Posted 3 years ago

Basically whatever was under the old /opt/trains/ folder is required, you can see the list here: None

  
  
Posted 3 years ago