Skip to content

Resolve "CI should run with lock_writes"

Rutger Wessels requested to merge 386528-ci-should-run-with-lock_writes into master

What does this MR do and why?

We have a script in place that will lock tables for write in case the write should happen in another database. https://gitlab.com/gitlab-org/gitlab/-/commits/master/lib/tasks/gitlab/db/lock_writes.rake. This script is used to prevent invalid writes in decomposed database setups. The lock is implemented using database triggers.

Invalid writes are for example:

  • Using main connection to update ci_pipelines table
  • Using ci connection to update projects table.

These locks are now in place on Gitlab.com production but not yet in CI. This MR will set these locks also on CI so we bring production/staging closer to test setup. This will help us catch issues like !106894 (comment 1216992707) earliear during development.

I added the rake task to setup_db() function in script/utils.sh. This bash file is sourced into scripts/prepare_build.sh which is used to setup the database environment for CI environment.

How to set up and validate locally

I added two migrations and ran the pipeline. The pipeline should fail because we do not allow making changes in ci table on main database:

Error: command failed: rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:
PG::SREModifyingSqlDataNotPermitted: ERROR:  Table: "ci_runner_versions" is write protected within this Gitlab database.
HINT:  Make sure you are using the right database connection
CONTEXT:  PL/pgSQL function gitlab_schema_prevent_write() line 4 at RAISE
SQL statement "DELETE FROM ci_runner_versions"
PL/pgSQL function test_write_lock() line 3 at EXECUTE

The idea behind testing is: we want to be sure that when we modify the contents of a locked table during a migration, this migration will fail in CI. So for example, we want to ensure that on main database, we can not delete data from ci_runner_versions table, because that table is in ci database.

My first attempts were just running TRUNCATE or DELETE directly but this did not work: we have QueryAnalyzers in place that detect these attempts and raise errors. So I had to find a bypass and decided to wrap the DELETE in a function, which is NOT verified by the QueryAnalyzers.

The first one is adding a function that will delete all data from ci_runner_versions. We

class AddFailingMigration < Gitlab::Database::Migration[2.1]
  def up
    execute <<~SQL
      CREATE OR REPLACE FUNCTION test_write_lock()
      RETURNS void AS
      $$
      BEGIN
        EXECUTE 'DELETE FROM ci_runner_versions';
      END;
      $$ LANGUAGE plpgsql;
    SQL
  end

  def down
    execute "drop function test_write_lock();"
  end
end

The second test migration (we needed two because DML and DDL in one migration is not allowed) looks like this:

class AddFailingMigrationDml < Gitlab::Database::Migration[2.1]
  restrict_gitlab_migration gitlab_schema: :gitlab_main

  def up
    execute "SELECT test_write_lock();"
  end

  def down; end
end

When run without rake gitlab:db:lock_writes, this migration will not raise an error because we did not lock the tables. But when we run rake gitlab:db:lock_writes before running the migrations, this migration will raise the error.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #386528 (closed)

Edited by Rutger Wessels

Merge request reports

Loading