Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
How Would Ya'Ll Approach Backing Up The Elastic-Search/Redis/Etc. Data In Self-Hosted Clearml? Any Drawbacks/Risks Of Doing A Simple Process That Periodically Zips Up The

How would ya'll approach backing up the elastic-search/redis/etc. data in self-hosted ClearML?

Any drawbacks/risks of doing a simple process that periodically zips up the /opt/clearml volume mount folder and uploads it to S3? (besides losing data since the last backup)

  
  
Posted 11 months ago
Votes Newest

Answers 20


of you want live backups (like backup every 30min or 1h) then you'll need to configure ES snapshots and probably periodically execute mongodump

  
  
Posted 11 months ago

Untested obviously

  
  
Posted 11 months ago

@<1541954607595393024:profile|BattyCrocodile47> , shouldn't be an issue - ClearML SDK is resilient to connectivity issues so if the server goes down the SDK will continue running and will just store all the data locally, once server is back up, it will send everything that was waiting.

Makes sense?

  
  
Posted 11 months ago

@<1523701070390366208:profile|CostlyOstrich36> Oh that’s smart. Is that to make sure no transactions happen during the backup? Would there be a risk of ongoing or pending tasks somehow getting corrupted if you shut the server down?

  
  
Posted 11 months ago

Also interested in how this is being approached 🙂 What you mentioned is exactly what I am doing

  
  
Posted 11 months ago

That will probably work if you're happy with the setup being offline for a period of time

  
  
Posted 11 months ago

Oh, that is cool. I captured all this. Maybe I'll make a user-data.sh script and docker-compose.yml file that brings all these things together. Probably won't have time for a few weeks.

  
  
Posted 11 months ago

Well, a simple version would be

#!/bin/sh

# requires this script to be called in the directory where the docker-compose file lives
docker-compose down
tar -cvpzf clearml-backup-$(date -u +%Y%m%dT%H%M) /opt/clearml
docker-compose up
  
  
Posted 11 months ago

You know, you could probably add some immortal containers to the docker-compose.yml that use images with mongodump and the ES equivalent installed.

The container(s) could have a bash script with a while loop in it that sleeps for 30 minutes and then does a backup. If you installed the AWS CLI inside, it could even take care of uploading to S3.

I like this idea, because docker-compose.yml could make sure that if the backup container ever dies, it would be restarted.

  
  
Posted 11 months ago

Ah, but it's probably worth noting that the docker-compose.yml does register the EC2 isntance that the server is running on as an agent listening on the services queue, so ongoing tasks in that queue that happen to be placed on the server would get terminated when docker-compose down is run.

  
  
Posted 11 months ago

Wow, that is seriously impressive.

  
  
Posted 11 months ago

The corresponding restore script would probably look like this

#!/bin/sh

backup=$1
# requires this script to be called in the directory where the docker-compose file lives
docker-compose down
# preserve the current directory just in case
mv /opt/clearml /opt/clearml-before-restore-$(date -u +%Y%m%dT%H%M)
mkdir /opt/clearml
tar -xvzf "$backup" -C /
docker-compose up
  
  
Posted 11 months ago

Can vouch, this works well. Had my server hard reboot (maybe bc of clearml? maybe bc of hardware, maybe both… haven’t figured it out), and busy remote workers still managed to update the backend once it came back up.

Re: backups… what would happen if zipped while running but no work was being performed? Still an issue potentially?

and what happens if docker compose down is run while there’s work in the services queue? Will it be restored? What are the implications if a backup is performed at this time and restored later?

  
  
Posted 11 months ago

Elasticsearch will potentially be corrupt when you run simple filesystem backups. You have no idea what is committed to disk vs what is still contained in memory. From experience I can tell you that a certain percentage of your backups will be corrupt and a restore will have usually a partial data loss or even a total since ES may simply refuse to start up and manually fixing the on-disk stuff is not practicable. Mongo file system snapshots at least used to be an acceptable backup mechanism (still seems to be the case None )

  
  
Posted 11 months ago

yeah, for mongodump that would be the way to go I guess, for ES you're probably better of to simply make use of ES' built-in snapshot-lifecycle-management policies that can automate taking snapshots for you ( None )

  
  
Posted 11 months ago

Earlier in the thread they mentioned that the agents are all resilient. So no ongoing tasks should be lost. I imagine even in a large organization, you could afford 5-10 minutes of downtime at 2AM or something.

That said, you'd only have 1 backup per day which could be a big deal depending on the experiments your running. You might want more than that.

  
  
Posted 11 months ago

@<1541954607595393024:profile|BattyCrocodile47> , that is indeed the suggested method - although make sure that the server is down while doing this

  
  
Posted 11 months ago

You have no idea what is committed to disk vs what is still contained in memory.

If you ran docker-compose down and allowed ES to gracefully shut down, would ES finish writing everything to disk, therefore guaranteeing that the backups wouldn't get corrupted?

  
  
Posted 11 months ago

We should put a $100 bounty on a bash script that backs up and restores mongodb, redis, and ES, etc. to S3 using the most resiliant ways 😄

  
  
Posted 11 months ago

As opposed to using CRON or something 🤣

  
  
Posted 11 months ago
706 Views
20 Answers
11 months ago
11 months ago
Tags