After some additional inspection, seems like the issue is docker related.7.7G /var/lib/docker/overlay2/
this is the directory which is causing the device storage issues.
that should be the case, we have default_output_uri:
set to an s3 bucket
Thanks Jake, I will have a look. Is there a reason a lot disk space would be used on the server instance? Is there something in the config I can change to ensure that minimal memory is used on that server, and mostly s3 is used for storage?
TenseOstrich47 see here: https://github.com/allegroai/clearml/issues/316#issuecomment-788995387
TenseOstrich47 this looks like elasticserach is out of space...
From what I can tell, docker has some leakage here. Temp files are not removed correctly, resulting in the build up of disk storage usage.
See the following for more details
https://stackoverflow.com/questions/46672001/is-it-safe-to-clean-docker-overlay2
https://forums.docker.com/t/some-way-to-clean-up-identify-contents-of-var-lib-docker-overlay/30604
https://docs.docker.com/storage/storagedriver/overlayfs-driver/
Im going to write a clean up script and add that to the cron. I dont believe there is an easy way around this issue as docker trades off disk storage for simplicity
ES can't use s3 for storage, nor can MongoDB
I thought nothing should be stored locally on the agent? Shouldn't all files be logged to the storage rather than the instance itself?
@<1687643893996195840:profile|RoundCat60> Hey Alex. Could you take a look at this when you're free later on please
TenseOstrich47 this sounds like a good idea.
When you have a script, please feel free to share, I think it will be useful for other users as well 🙂
@<1523701157564780544:profile|TenseOstrich47> This is typically indicative of insufficient server disk space causing ES to go into read-only mode or turn active shards into inactive or unassigned (see FAQ ).
The disk watermarks controlling the ES free-disk constraints are defined by default as % of the disk space (so it might look to you like you still have plenty of space, but ES thinks otherwise). You can configure different ES settings in the docker-compose.yml file (see here - there are 3 settings, all can be identical)
If you don't have enough free disk space, clean up files to create more, or resize your partition (or increase your disk size if on a cloud instance).
@<1523701157564780544:profile|TenseOstrich47> The storage in question here is what's available on the machine hosting the ClearML server's docker containers (specifically, the ES one).