Fail loud when the database backup failed
What does this MR do and why?
Fail loud when the database backup failed
Related to #411497 (closed)
Currently the backup command exits with code 0 and creates a backup file including all repositories even if the connection to the database failed and no database dump was created.
Since creating a backup is commonly an automated process. It would help the backup health monitoring to see via the exit code if the backup was created successfully. Also it does not seem to make a lot of sense to continue backing up all repository data if the backup can not be loaded anyways due to a missing database. It just takes time and creates a misleadingly big backup file. Misleading because the backup is incomplete.
There is a small note in the in the beginning of the backup log that the dump failed. That note also never shows a [FAILED] from this method
def report_success(success)
if success
progress.puts '[DONE]'.color(:green)
else
progress.puts '[FAILED]'.color(:red)
end
end
because the raise
is triggered before report_sucess
is called.
raise DatabaseBackupError.new(config, db_file_name) unless success
report_success(success)
progress.flush
This command no longer silently catches the dump error and fails fast and loud. To gurantee that a passed backup command creates a functional backup.
sudo -u git -H bundle exec rake gitlab:backup:create RAILS_ENV=production
Screenshots or screen recordings
It is not a visual change. You see the difference when running the backup command. And looking at the output log and exit code. I put the full before and after logs further down in a spoiler.
Before | After |
---|---|
gitlab-rake gitlab:backup:create exit code 0 on db error |
gitlab-rake gitlab:backup:create exit code 1 on db error |
How to set up and validate locally
The issue I ran into was a external database having a version miss match. Gitlab runs fine but the backup fails because pg_dump is in the wrong version. So using these two docker compose files you can reproduce a failing database backup.
My setup might be more complicated than yours so you can also just run the backuo command on a gitlab instance with a broken database. And then apply my commit and see how it changes the outcome.
But if you want to reproduce my testing setup here it is.
$ docker --version
Docker version 24.0.2, build cb74dfcd85
$ docker-compose --version
Docker Compose version 2.18.1
# gitlab-db/docker-compose.yml
version: '3'
services:
gitlab-db14:
# gitlab is not compatible with postgres 14
# this is why the backup will fail
image: postgres:14.7-alpine
volumes:
- ./db14:/var/lib/postgresql/data:rw
environment:
- POSTGRES_DB=gitlabhq_production
- POSTGRES_USER=gitlab
- POSTGRES_PASSWORD=secure_pg_pass123
ports:
- 5432:5432
expose:
- 5432
healthcheck:
test: ["CMD-SHELL", "PGPASSWORD=secure_pg_pass123 psql -h gitlab-db14 -U gitlab gitlabhq_production -c '\\l'"]
interval: 5s
timeout: 5s
retries: 3
networks:
- gitlab-databases
networks:
gitlab-databases:
# gitlab/docker-compose.yml
version: '3.6'
services:
web:
image: 'gitlab/gitlab-ce:16.1.0-ce.0'
restart: always
hostname: 'test-gitlab.mydomain.com'
environment:
GITLAB_OMNIBUS_CONFIG: |
external_url 'https://test-gitlab.mydomain.com'
nginx['ssl_certificate'] = "/etc/gitlab/ssl/test-gitlab.mydomain.com.cer"
nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/test-gitlab.mydomain.com.key"
letsencrypt['enable'] = false
gitlab_rails['manage_backup_path'] = false
gitlab_rails['backup_path'] = '/backups'
postgresql['enable'] = false
gitlab_rails['db_adapter'] = 'postgresql'
gitlab_rails['db_encoding'] = 'utf8'
gitlab_rails['db_host'] = 'gitlab-db14'
gitlab_rails['db_port'] = 5432
gitlab_rails['db_username'] = 'gitlab'
gitlab_rails['db_password'] = "secure_pg_pass123"
extra_hosts:
- "test-gitlab.mydomain.com:127.0.0.1"
ports:
- '192.168.178.29:22:22'
- '192.168.178.29:80:80'
- '192.168.178.29:443:443'
volumes:
- './data/backups:/backups'
- './data/logs:/var/log/gitlab'
- './data/data:/var/opt/gitlab'
- './data/config:/etc/gitlab'
- './certs:/etc/gitlab/ssl/'
networks:
- gitlab-databases
shm_size: '256m'
networks:
gitlab-databases:
external:
name: gitlab-db_gitlab-databases
For this to work I changed test-gitlab.mydomain.com to my actual domain and made it point to the local ip of my laptop (192.168.178.29). And then created some ssl certificates for the domain and put them in the certs/ folder.
So the final result looks like this:
$ tree
.
├── gitlab
│ ├── certs
│ │ ├── ca.cer
│ │ ├── fullchain.cer
│ │ ├── test-gitlab.mydomain.com.cer
│ │ ├── test-gitlab.mydomain.com.conf
│ │ ├── test-gitlab.mydomain.com.csr
│ │ ├── test-gitlab.mydomain.com.csr.conf
│ │ └── test-gitlab.mydomain.com.key
│ └── docker-compose.yml
└── gitlab-db
└── docker-compose.yml
4 directories, 9 files
Then start the db. Then gitlab. Wait for it to launch. Create a backup. And see it failing but exiting with code 0.
cd gitlab-db
docker-compose up -d --wait
cd ../gitlab
docker-compose up -d --wait
# this command shows the db error at the top but continues as if nothing had happend
docker exec -i gitlab-web-1 gitlab-rake gitlab:backup:create
Now to test my patch you can download the files from my commit and mount them into the container
cd gitlab
docker-compose down
mkdir patch
cd patch
wget https://gitlab.com/ChillerDragon/gitlab/-/raw/1321aa05e66b6b10f8878401f7b494008eab51c9/lib/backup/manager.rb
wget https://gitlab.com/ChillerDragon/gitlab/-/raw/1321aa05e66b6b10f8878401f7b494008eab51c9/lib/backup/database.rb
Now add this in the volumes section of the gitlab/docker-compose.yml
- './patch/database.rb:/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb'
- './patch/manager.rb:/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb'
docker-compose up -d --wait
# this will now stop on database dump failure
# and exit with code 1
docker exec -i gitlab-web-1 gitlab-rake gitlab:backup:create
before it did continue and create a backup archive with all repositories:
[user@host gitlab]$ docker exec -i gitlab-web-1 gitlab-rake gitlab:backup:create SKIP=registry,artifacts,builds,pages
2023-06-22 15:33:15 +0200 -- Dumping database ...
Dumping PostgreSQL database gitlabhq_production ... pg_dump: error: server version: 14.7; pg_dump version: 13.11
pg_dump: error: aborting because of server version mismatch
2023-06-22 15:33:15 +0200 -- Dumping database failed: Failed to create compressed file '/backups/db/database.sql.gz' when trying to backup the main database:
- host: 'gitlab-db14'
- port: '5432'
- database: 'gitlabhq_production'
2023-06-22 15:33:15 +0200 -- Dumping repositories ...
{"command":"create","gl_project_path":"gitlab-instance-513de534/Monitoring","level":"info","msg":"started create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git","storage_name":"default","time":"2023-06-22T13:33:15.535Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"gitlab-instance-513de534/Monitoring","level":"warning","msg":"skipped create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git","storage_name":"default","time":"2023-06-22T13:33:15.539Z"}
{"command":"create","gl_project_path":"gitlab-instance-513de534/Monitoring.wiki","level":"info","msg":"started create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.wiki.git","storage_name":"default","time":"2023-06-22T13:33:15.635Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"gitlab-instance-513de534/Monitoring.wiki","level":"warning","msg":"skipped create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.wiki.git","storage_name":"default","time":"2023-06-22T13:33:15.637Z"}
{"command":"create","gl_project_path":"gitlab-instance-513de534/Monitoring","level":"info","msg":"started create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.design.git","storage_name":"default","time":"2023-06-22T13:33:15.671Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"gitlab-instance-513de534/Monitoring","level":"warning","msg":"skipped create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.design.git","storage_name":"default","time":"2023-06-22T13:33:15.671Z"}
{"command":"create","gl_project_path":"my.user/postgresmomemt","level":"info","msg":"started create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git","storage_name":"default","time":"2023-06-22T13:33:15.673Z"}
{"command":"create","gl_project_path":"my.user/postgresmomemt.wiki","level":"info","msg":"started create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.wiki.git","storage_name":"default","time":"2023-06-22T13:33:15.676Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"my.user/postgresmomemt.wiki","level":"warning","msg":"skipped create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.wiki.git","storage_name":"default","time":"2023-06-22T13:33:15.677Z"}
{"command":"create","gl_project_path":"my.user/postgresmomemt","level":"info","msg":"started create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.design.git","storage_name":"default","time":"2023-06-22T13:33:15.679Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"my.user/postgresmomemt","level":"warning","msg":"skipped create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.design.git","storage_name":"default","time":"2023-06-22T13:33:15.680Z"}
{"command":"create","gl_project_path":"my.user/postgresmomemt","level":"info","msg":"completed create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git","storage_name":"default","time":"2023-06-22T13:33:15.681Z"}
2023-06-22 15:33:15 +0200 -- Dumping repositories ... done
2023-06-22 15:33:15 +0200 -- Dumping uploads ...
2023-06-22 15:33:15 +0200 -- Dumping uploads ... done
2023-06-22 15:33:15 +0200 -- Dumping builds ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Dumping artifacts ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Dumping pages ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Dumping lfs objects ...
2023-06-22 15:33:15 +0200 -- Dumping lfs objects ... done
2023-06-22 15:33:15 +0200 -- Dumping terraform states ...
2023-06-22 15:33:15 +0200 -- Dumping terraform states ... done
2023-06-22 15:33:15 +0200 -- Dumping container registry images ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Dumping packages ...
2023-06-22 15:33:15 +0200 -- Dumping packages ... done
2023-06-22 15:33:15 +0200 -- Creating backup archive: 1687440795_2023_06_22_15.11.8_gitlab_backup.tar ...
2023-06-22 15:33:15 +0200 -- Creating backup archive: 1687440795_2023_06_22_15.11.8_gitlab_backup.tar ... done
2023-06-22 15:33:15 +0200 -- Uploading backup archive to remote storage ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Deleting old backups ...
2023-06-22 15:33:15 +0200 -- Deleting old backups ... done. (0 removed)
2023-06-22 15:33:15 +0200 -- Deleting tar staging files ...
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/backup_information.yml
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/db
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/repositories
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/uploads.tar.gz
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/lfs.tar.gz
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/terraform_state.tar.gz
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/packages.tar.gz
2023-06-22 15:33:15 +0200 -- Deleting tar staging files ... done
2023-06-22 15:33:15 +0200 -- Deleting backups/tmp ...
2023-06-22 15:33:15 +0200 -- Deleting backups/tmp ... done
2023-06-22 15:33:15 +0200 -- Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data
and are not included in this backup. You will need these files to restore a backup.
Please back them up manually.
2023-06-22 15:33:15 +0200 -- Backup 1687440795_2023_06_22_15.11.8 is done.
2023-06-22 13:33:15 +0000 -- Deleting backup and restore lock file
[user@host gitlab]$ echo $?
0
after it fails fast and loud:
[user@host gitlab]$ docker exec -i gitlab-web-1 gitlab-rake gitlab:backup:create SKIP=registry,artifacts,builds,pages
2023-06-26 09:47:57 +0200 -- Dumping database ...
Dumping PostgreSQL database gitlabhq_production ... pg_dump: error: server version: 14.7; pg_dump version: 13.11
pg_dump: error: aborting because of server version mismatch
[FAILED]
rake aborted!
Backup::Error: Dumping database failed: Failed to create compressed file '/backups/db/database.sql.gz' when trying to backup the main database:
- host: 'gitlab-db14'
- port: '5432'
- database: 'gitlabhq_production'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:75:in `rescue in run_create_task'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:55:in `run_create_task'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:222:in `block in run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:221:in `each_key'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:221:in `run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:47:in `create'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:13:in `block in create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:62:in `lock_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:10:in `create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:101:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:25:in `load'
/opt/gitlab/embedded/bin/bundle:25:in `<main>'
Caused by:
Backup::DatabaseBackupError: Failed to create compressed file '/backups/db/database.sql.gz' when trying to backup the main database:
- host: 'gitlab-db14'
- port: '5432'
- database: 'gitlabhq_production'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb:63:in `block in dump'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb:277:in `each'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb:277:in `each_database_snapshot_id'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb:30:in `dump'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:71:in `run_create_task'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:222:in `block in run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:221:in `each_key'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:221:in `run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:47:in `create'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:13:in `block in create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:62:in `lock_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:10:in `create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:101:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:25:in `load'
/opt/gitlab/embedded/bin/bundle:25:in `<main>'
Tasks: TOP => gitlab:backup:create
(See full trace by running task with --trace)
2023-06-26 09:47:57 +0200 -- Deleting tar staging files ...
2023-06-26 09:47:57 +0200 -- Cleaning up /backups/db
2023-06-26 09:47:57 +0200 -- Deleting tar staging files ... done
2023-06-26 09:47:57 +0200 -- Deleting backups/tmp ...
2023-06-26 09:47:57 +0200 -- Deleting backups/tmp ... done
2023-06-26 09:47:57 +0200 -- Deleting backup and restore PID file ... done
[user@host gitlab]$ echo $?
1
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.