Reputation
Badges 1
10 × Eureka!I guess what I'm asking - can someone point me to a config file with a full redis config section? Reading the source code it loads some sort of "alias" sub-config which isn't explained anywhere what this is and why it's there?
ok, found it - you need to set apiserver and workers alias sub-config
The corresponding restore script would probably look like this
#!/bin/sh
backup=$1
# requires this script to be called in the directory where the docker-compose file lives
docker-compose down
# preserve the current directory just in case
mv /opt/clearml /opt/clearml-before-restore-$(date -u +%Y%m%dT%H%M)
mkdir /opt/clearml
tar -xvzf "$backup" -C /
docker-compose up
Ok thanks, is it sufficient to backup mongo and elastic independently or do they need to be backed up in a synchronized fashion?
yeah, for mongodump that would be the way to go I guess, for ES you're probably better of to simply make use of ES' built-in snapshot-lifecycle-management policies that can automate taking snapshots for you ( None )
Because running it as docker compose would imply running it on a VM. Running production stuff on a VM is not acceptable since we don't have the capacity (nor desire) to keeps VMs patched, manage redundancy (while patching and rebooting), manage secure remote access, etc.
Hence we only deploy to native managed Azure services (App service, ACA, etc.).
Plus following the principle of dependency injection I'd rather make external service dependencies explicit. For example we already know how to ...
In any case @<1523701087100473344:profile|SuccessfulKoala55> @<1523701070390366208:profile|CostlyOstrich36> - is it possible to customize the database names that clearml is using or are they hard-coded in too many places to backend and auth?
Been perusing the code - it seems like ES is only used to log some queue metrics (how long the queue was and the avg wait time) and some event metrics. So I would not consider that information that needs to be restored.
On that note task_bll.py
creates an events_es
instance but nothing ever seems to use it. Same for the redis instance.
Since CosmosDB isn't 100% Mongo feature complete (even though the API is but Cosmos lacks full text indices) I have now spun up an actual Mongo service.
I can see auth and backend databases now, but also a seemingly random test one that contains "queue" and "user" collection. Not sure where that is coming from.
That will probably work if you're happy with the setup being offline for a period of time
I'm running clearml as an container app with mainly environment variables configuring the connections to the relevant services.
Well, a simple version would be
#!/bin/sh
# requires this script to be called in the directory where the docker-compose file lives
docker-compose down
tar -cvpzf clearml-backup-$(date -u +%Y%m%dT%H%M) /opt/clearml
docker-compose up
of you want live backups (like backup every 30min or 1h) then you'll need to configure ES snapshots and probably periodically execute mongodump
Elasticsearch will potentially be corrupt when you run simple filesystem backups. You have no idea what is committed to disk vs what is still contained in memory. From experience I can tell you that a certain percentage of your backups will be corrupt and a restore will have usually a partial data loss or even a total since ES may simply refuse to start up and manually fixing the on-disk stuff is not practicable. Mongo file system snapshots at least used to be an acceptable backup mechanism (...
Also it's a little bit disappointing how much reverse engineering (i.e. reading the code) one has to do to find out what can actually be configured. I understand that writing docs is a chore and nobody likes to do it but it would be nice if this could be improved - especially when it comes to configuring the external services