Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, So We Noticed The

Hey, so we noticed the /var/lib/docker/overlay2 directory on the clearml-server is growing a lot in size, we added more disk space but we want to put something in place to stop this growing too much.
These are the options I’ve looked into:

  1. docker system prune - removes all stopped containers, all networks not used by at least one container, all dangling images, all dangling build cache, Problem: we don’t really know what this is pruning
  2. docker image prune --all - removes all images without at least one container associated to them
  3. Set the max-size in docker-compose.yaml for logging

Are the first 2 options safe to run without killing the server? I’m not happy on removing files without knowing what they are.
Are there any plans to automate this in the future?

  
  
Posted 4 years ago
Votes Newest

Answers 43


yes those have all been created

  
  
Posted 4 years ago

Try looking at their logs

  
  
Posted 4 years ago

we turn off the server every evening...

In that case the issue is definitely not related to the mount points

  
  
Posted 4 years ago

@<1687643893996195840:profile|RoundCat60> can you verify all the volume mounts point to existing directories on the server machine? (i.e. /opt/clearml/... )

  
  
Posted 4 years ago

so yes indeedly ..

sudo find /var/lib/ -type d -exec du -s -x -h {} \; | grep G | more

seems to give saner results.. of course, in your case, you may also want to grep M for megabyte

  
  
Posted 4 years ago

I think that if these directories are not mounted, you should first of all take care not to shut down the server. You'll probably want to exec /bin/bash into the mongo and elastic containers, and copy their data outside to the host storage

  
  
Posted 4 years ago

Basically whatever was under the old /opt/trains/ folder is required, you can see the list here: None

  
  
Posted 4 years ago

not yet, going to try and fix it today.

if I do a df I see this, which is concerning:

Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G  928K  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/nvme0n1p1   20G  7.9G   13G  40% /
tmpfs           790M     0  790M   0% /run/user/1000

so it looks like the mount points are not created. When do these get created? I thought using an AMI these would have already been setup?

  
  
Posted 4 years ago

thanks @<1523715084633772032:profile|AlertBlackbird30> this is really informative. Nothing seems to be particularly out of the ordinary though

3.7G	/var/lib/
3.7G	/var/lib/docker
3.0G	/var/lib/docker/overlay2

followed by a whole load of files that are a few hundred KBs in size, nothing huge though

  
  
Posted 4 years ago

🤔 i'll add the logging max_size now and monitor over the next week

  
  
Posted 4 years ago

also, is there a list anywhere with the mount points that are needed?

  
  
Posted 4 years ago

Not necessarily, is there any data in those directories?

  
  
Posted 4 years ago

Howdy and Morning @<1687643893996195840:profile|RoundCat60> .. docker when using overlay2 doesn't have it's mount points show up in a 'df' btw, they will only appear in a 'df -a', mostly because since they are simply 'overlays', they don't (technically) consume any space (I mean, the files are still in the /var/lib but not for the space counting practices used by df)

this is why I was suggesting a find, maybe with a 'du' .. actually.. let me try that here.. 2s

  
  
Posted 4 years ago

strange, I used one of the publicly available AMIs for ClearML (we did not upgrade from the Trains AMI as started fresh)

  
  
Posted 4 years ago

no, they are still rebooting. i've looked in /opt/clearml/logs/apiserver.log no errors

  
  
Posted 4 years ago

back up and running again, thanks for your help

  
  
Posted 4 years ago

container_name:
  logging:
    options:
      max-size: 10m
  
  
Posted 4 years ago

it looks like clearml-apiserver and clearml-fileserver are continually restarting

  
  
Posted 4 years ago

so am I right in thinking it's just the mount points that are missing?based on the output of df above

  
  
Posted 4 years ago

think I found the issue, a typo in apiserver.conf

  
  
Posted 4 years ago

you will probably want to find the culprit, so a find should work wonders. I probably suspect elasticsearch first. It tends to go nuts 😕

  
  
Posted 4 years ago

Check sudo docker logs <container-name>

  
  
Posted 4 years ago

After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen
image

  
  
Posted 4 years ago

It looks like not all the containers are up... Try sudo docker ps and see if the apiserver container is restarting...

  
  
Posted 4 years ago

@<1687643893996195840:profile|RoundCat60> you set it once, inside the docker-compose itself.. it will affect all docker containers but, to be honest, docker tends to log everything

  
  
Posted 4 years ago

I believe you can set it on a 'per container' way as well.

  
  
Posted 4 years ago

thanks Stef, with max-size do you set it for every running service separately, or can you set it once?

  
  
Posted 4 years ago

not entirely sure on this as we used the custom AMI solution, is there any documentation on it?

  
  
Posted 4 years ago

yep, in most of them:

/opt/clearml/config
apiserver.conf
clearml.conf

/opt/clearml/data/elastic_7
/nodes

/opt/clearml/data/fileserver
<empty>

/opt/clearml/data/mongo/configdb
<empty>

/opt/clearml/data/mongo/db
collection/index files, /diagnostic.data, /journal etc

/opt/clearml/data/redis 
dump.rdb

/opt/clearml/logs
apiserver.log.x, filserver.log (0 bytes)
  
  
Posted 4 years ago

... from the AMI creation script:

# prepare directories to store data
sudo mkdir -p /opt/clearml/data/elastic_7
sudo mkdir -p /opt/clearml/data/redis
sudo mkdir -p /opt/clearml/data/mongo/db
sudo mkdir -p /opt/clearml/data/mongo/configdb
sudo mkdir -p /opt/clearml/logs
sudo mkdir -p /opt/clearml/config
sudo mkdir -p /opt/clearml/data/fileserver
sudo chown -R 1000:1000 /opt/clearml/data/elastic_7

So it seems the AMI is using the correct directories... Do you have these?

  
  
Posted 4 years ago
135K Views
43 Answers
4 years ago
one year ago
Tags