Resolve "CI should run with lock_writes"
What does this MR do and why?
We have a script in place that will lock tables for write in case the write should happen in another database. https://gitlab.com/gitlab-org/gitlab/-/commits/master/lib/tasks/gitlab/db/lock_writes.rake. This script is used to prevent invalid writes in decomposed database setups. The lock is implemented using database triggers.
Invalid writes are for example:
- Using
main
connection to updateci_pipelines
table - Using
ci
connection to updateprojects
table.
These locks are now in place on Gitlab.com production but not yet in CI. This MR will set these locks also on CI so we bring production/staging closer to test setup. This will help us catch issues like !106894 (comment 1216992707) earliear during development.
I added the rake task to setup_db()
function in script/utils.sh
. This bash file is sourced into scripts/prepare_build.sh
which is used to setup the database environment for CI environment.
How to set up and validate locally
I added two migrations and ran the pipeline. The pipeline should fail because we do not allow making changes in ci
table on main
database:
Error: command failed: rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:
PG::SREModifyingSqlDataNotPermitted: ERROR: Table: "ci_runner_versions" is write protected within this Gitlab database.
HINT: Make sure you are using the right database connection
CONTEXT: PL/pgSQL function gitlab_schema_prevent_write() line 4 at RAISE
SQL statement "DELETE FROM ci_runner_versions"
PL/pgSQL function test_write_lock() line 3 at EXECUTE
The idea behind testing is: we want to be sure that when we modify the contents of a locked table during a migration, this migration will fail in CI. So for example, we want to ensure that on main
database, we can not delete data from ci_runner_versions
table, because that table is in ci
database.
My first attempts were just running TRUNCATE
or DELETE
directly but this did not work: we have QueryAnalyzers in place that detect these attempts and raise errors. So I had to find a bypass and decided to wrap the DELETE
in a function, which is NOT verified by the QueryAnalyzers.
The first one is adding a function that will delete all data from ci_runner_versions
. We
class AddFailingMigration < Gitlab::Database::Migration[2.1]
def up
execute <<~SQL
CREATE OR REPLACE FUNCTION test_write_lock()
RETURNS void AS
$$
BEGIN
EXECUTE 'DELETE FROM ci_runner_versions';
END;
$$ LANGUAGE plpgsql;
SQL
end
def down
execute "drop function test_write_lock();"
end
end
The second test migration (we needed two because DML and DDL in one migration is not allowed) looks like this:
class AddFailingMigrationDml < Gitlab::Database::Migration[2.1]
restrict_gitlab_migration gitlab_schema: :gitlab_main
def up
execute "SELECT test_write_lock();"
end
def down; end
end
When run without rake gitlab:db:lock_writes
, this migration will not raise an error because we did not lock the tables. But when we run rake gitlab:db:lock_writes
before running the migrations, this migration will raise the error.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #386528 (closed)