Skip to content

Add undo helpers for change_column_type_concurrently

What does this MR do?

The following migration helpers:

  1. change_column_type_concurrently
  2. cleanup_concurrent_column_type_change

Have no proper rollback (undo_) counterparts, so their changes can not be rolled back correctly in case of an incident.

As discussed in #247496 (closed), the provided examples in Changing Column Types Guide do not work correctly in case only the first of the two migrations has to be rolled back by itself: The result of the rollback is going to basically result in a state similar to both the migrations having run, not the initial state of the database.

Given that username is a string column, migrating and then rolling back the following will result to username being a text column.

# A regular migration in db/migrate
class ChangeUsersUsernameStringToText < ActiveRecord::Migration[4.2]
  include Gitlab::Database::MigrationHelpers

  disable_ddl_transaction!

  def up
    change_column_type_concurrently :users, :username, :text
  end

  def down
    cleanup_concurrent_column_type_change :users, :username
  end
end

This MR:

  • Adds undo helpers for change_column_type_concurrently

    • undo_change_column_type_concurrently

      Reverses operations performed by change_column_type_concurrently

    • undo_cleanup_concurrent_column_type_change

      Reverses operations performed by cleanup_concurrent_column_type_change

  • Adds specs for both migration helpers

  • Updates the Changing Column Types section of the What Requires Downtime Guide

Related Issue: #247496 (closed)

Proof of concept

Let's assume that we have two migrations:

The first one initiates the change_column_type_concurrently

class TestChangeColumnConcurrently < ActiveRecord::Migration[6.0]

  include Gitlab::Database::MigrationHelpers
  DOWNTIME = false

  disable_ddl_transaction!

  def up
    change_column_type_concurrently :plans, :title, :text
  end

  def down
    undo_change_column_type_concurrently :plans, :title
  end
end

We can see that the new undo_change_column_type_concurrently works without issues in rolling back this migration:

bundle exec rake db:migrate
git diff db/structure.sql

    +CREATE FUNCTION trigger_eedff3419ee4() RETURNS trigger
    +    LANGUAGE plpgsql
    +    AS $$
    +BEGIN
    +  NEW."title_for_type_change" := NEW."title";
    +  RETURN NEW;
    +END;
    +$$;
    +
     CREATE TABLE audit_events_part_5fc467ac26 (
         ... ... ...
    -    title character varying
    +    title character varying,
    +    title_for_type_change text
     );
    +CREATE TRIGGER trigger_eedff3419ee4 BEFORE INSERT OR UPDATE ON plans FOR EACH ROW EXECUTE PROCEDURE trigger_eedff3419ee4();

bundle exec rake db:rollback
git diff db/structure.sql

 # Nothing - it is the same, back to where we started

Then we introduce an additional migration that does the final cleanup step:

class TestCleanupChangeColumnConcurrently < ActiveRecord::Migration[6.0]
  include Gitlab::Database::MigrationHelpers

  DOWNTIME = false
  disable_ddl_transaction!

  def up
    cleanup_concurrent_column_type_change :plans, :title
  end

  def down
    undo_cleanup_concurrent_column_type_change :plans, :title, :string
  end
end

The final result is what we would expect:

bundle exec rake db:migrate
git diff db/structure.sql

    -    title character varying
    +    title text
     );

Rolling back only the second migration correctly brings us back to where the first one left the schema:

bundle exec rake db:rollback
git diff db/structure.sql

    +CREATE FUNCTION trigger_eedff3419ee4() RETURNS trigger
    +    LANGUAGE plpgsql
    +    AS $$
    +BEGIN
    +  NEW."title_for_type_change" := NEW."title";
    +  RETURN NEW;
    +END;
    +$$;
    +
     CREATE TABLE audit_events_part_5fc467ac26 (
      ... ...
    +    title_for_type_change text,
         title character varying
     );
    +CREATE TRIGGER trigger_eedff3419ee4 BEFORE INSERT OR UPDATE ON plans FOR EACH ROW EXECUTE PROCEDURE trigger_eedff3419ee4();

Not the small inconsistency in the column order for which we can not do that much. But either way, this is an intermediary step that will be either further rolled back or the migration will be retried with the same results.

Finally, rolling the first migration also brings the schema back to where it was initially, with the correct initial type for the starting column:

bundle exec rake db:rollback
git diff db/structure.sql

 # Nothing - it is the same, back to where we started

(Not an) Important Disclaimer Section (anymore)

The following have been addressed by following @pbair's suggestion and changing the order that we do the operations inside undo_cleanup_concurrent_column_type_change.

undo_cleanup_concurrent_column_type_change should be safe and without any risks of leaving the database schema with inconsistencies.

I am leaving the rest of this section as it was as there are comments referencing it:


While the newly introduced migration helpers highly reduce the probability of an inconsistent schema in the database and they allow us to properly roll back to the original column while also using type cast functions, they still have a shortcoming in the case a rollback fails:

undo_cleanup_concurrent_column_type_change has to rename the final column back to a temporary one and then recreate the original column by using create_column_from. At a high level, it has to do the following 3 operations:

undo_cleanup_concurrent_column_type_change(..)
  1. Rename the final column back to a temp one: rename_column(table, column, temp_column)
  2. Recreate the original column: create_column_from(table, temp_column, column, ... )
  3. install_rename_triggers(table, column, temp_column)

In a perfect world, we would wrap all those three steps in a transaction and sleep tight that nothing can happen.

But, unfortunately, create_column_from can not run inside a transaction, which means that there is a risk that if it fails while rolling back the migration (in step 2 above), then we can leave the schema in an inconsistent state without the original column and only the temporary column available.

I can not see how we could wrap the operations above in a transaction, so, if there is not something I have not thought about, we have a choice:

  1. Close this MR and forget about the helpers introduced in this MR

    • Bad: Our existing implementation has no way to go back to the initial column type if there is an error while rolling forward (i.e. the second migration fails and is never applied)

    • Semi-Good: as far as I can see, it always leaves the schema with the original column defined (even with a different type)

      This is not always 100% what we want and it could introduce problems on its own as the new data type may not work correctly with the old code that assumes a different type. (that was the reason I started working in this MR in the first place)

  2. Introduce and start using the helpers in this MR

    • Good: If there is an error while rolling forward, you can always rollback the first migration and go back to where you started

    • Bad: If we have to rollback after both migrations run then we risk the edge case that the rollback can fail and leave the schema in a super inconsistent state. You can't do worse than a missing column!

      But we normally don't rollback migrations that succeed, so this is a truly edge case that must be accompanied by an additional failure in the rollback to end up in that case.

I am not sure which is the worse scenario, so I am going to leave this open for debate in the comment section of this MR.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Yannis Roussos

Merge request reports

Loading