Hey everyone, in case anyone is interested, I created a utility script for making backup snapshots of a local ClearML server without server shutdowns.
My team is working with large datasets and long running tasks which makes periodic server shutdowns really painful. I did not find any existing solution so I made this thing with a typer interface (because I wanted to play around with it 🙂 ) for creating and restoring snapshots of config and all db components. It's a self contained uv script so provided that you have uv in your system you can just uv run clearml_backup_tool.py --help
and just go from there. It can also set itself up as a cron job.
So far I tried it out on a dummy server instance which went fine. Next week I'll try it out on our multi-TB instance to see how it goes. I'd be happy for any feedback or possible risks that I am missing. Of course everyone is also free to use it but I don't make any promises about its robustness so far.