Add undo helpers for change_column_type_concurrently
What does this MR do?
The following migration helpers:
Have no proper rollback (undo_
) counterparts, so their changes can not be rolled back correctly in case of an incident.
As discussed in #247496 (closed), the provided examples in Changing Column Types Guide do not work correctly in case only the first of the two migrations has to be rolled back by itself: The result of the rollback is going to basically result in a state similar to both the migrations having run, not the initial state of the database.
Given that username
is a string column, migrating and then rolling back the following will result to username
being a text column.
# A regular migration in db/migrate
class ChangeUsersUsernameStringToText < ActiveRecord::Migration[4.2]
include Gitlab::Database::MigrationHelpers
disable_ddl_transaction!
def up
change_column_type_concurrently :users, :username, :text
end
def down
cleanup_concurrent_column_type_change :users, :username
end
end
This MR:
-
Adds undo helpers for
change_column_type_concurrently
-
undo_change_column_type_concurrently
Reverses operations performed by change_column_type_concurrently
-
undo_cleanup_concurrent_column_type_change
Reverses operations performed by cleanup_concurrent_column_type_change
-
-
Adds specs for both migration helpers
-
Updates the Changing Column Types section of the What Requires Downtime Guide
Related Issue: #247496 (closed)
Proof of concept
Let's assume that we have two migrations:
The first one initiates the change_column_type_concurrently
class TestChangeColumnConcurrently < ActiveRecord::Migration[6.0]
include Gitlab::Database::MigrationHelpers
DOWNTIME = false
disable_ddl_transaction!
def up
change_column_type_concurrently :plans, :title, :text
end
def down
undo_change_column_type_concurrently :plans, :title
end
end
We can see that the new undo_change_column_type_concurrently
works without issues in rolling back this migration:
bundle exec rake db:migrate
git diff db/structure.sql
+CREATE FUNCTION trigger_eedff3419ee4() RETURNS trigger
+ LANGUAGE plpgsql
+ AS $$
+BEGIN
+ NEW."title_for_type_change" := NEW."title";
+ RETURN NEW;
+END;
+$$;
+
CREATE TABLE audit_events_part_5fc467ac26 (
... ... ...
- title character varying
+ title character varying,
+ title_for_type_change text
);
+CREATE TRIGGER trigger_eedff3419ee4 BEFORE INSERT OR UPDATE ON plans FOR EACH ROW EXECUTE PROCEDURE trigger_eedff3419ee4();
bundle exec rake db:rollback
git diff db/structure.sql
# Nothing - it is the same, back to where we started
Then we introduce an additional migration that does the final cleanup step:
class TestCleanupChangeColumnConcurrently < ActiveRecord::Migration[6.0]
include Gitlab::Database::MigrationHelpers
DOWNTIME = false
disable_ddl_transaction!
def up
cleanup_concurrent_column_type_change :plans, :title
end
def down
undo_cleanup_concurrent_column_type_change :plans, :title, :string
end
end
The final result is what we would expect:
bundle exec rake db:migrate
git diff db/structure.sql
- title character varying
+ title text
);
Rolling back only the second migration correctly brings us back to where the first one left the schema:
bundle exec rake db:rollback
git diff db/structure.sql
+CREATE FUNCTION trigger_eedff3419ee4() RETURNS trigger
+ LANGUAGE plpgsql
+ AS $$
+BEGIN
+ NEW."title_for_type_change" := NEW."title";
+ RETURN NEW;
+END;
+$$;
+
CREATE TABLE audit_events_part_5fc467ac26 (
... ...
+ title_for_type_change text,
title character varying
);
+CREATE TRIGGER trigger_eedff3419ee4 BEFORE INSERT OR UPDATE ON plans FOR EACH ROW EXECUTE PROCEDURE trigger_eedff3419ee4();
Not the small inconsistency in the column order for which we can not do that much. But either way, this is an intermediary step that will be either further rolled back or the migration will be retried with the same results.
Finally, rolling the first migration also brings the schema back to where it was initially, with the correct initial type for the starting column:
bundle exec rake db:rollback
git diff db/structure.sql
# Nothing - it is the same, back to where we started
(Not an) Important Disclaimer Section (anymore)
The following have been addressed by following @pbair's suggestion and changing the order that we do the operations inside undo_cleanup_concurrent_column_type_change
.
undo_cleanup_concurrent_column_type_change
should be safe and without any risks of leaving the database schema with inconsistencies.
I am leaving the rest of this section as it was as there are comments referencing it:
While the newly introduced migration helpers highly reduce the probability of an inconsistent schema in the database and they allow us to properly roll back to the original column while also using type cast functions, they still have a shortcoming in the case a rollback fails:
undo_cleanup_concurrent_column_type_change
has to rename the final column back to a temporary one and then recreate the original column by using create_column_from
. At a high level, it has to do the following 3 operations:
undo_cleanup_concurrent_column_type_change(..)
1. Rename the final column back to a temp one: rename_column(table, column, temp_column)
2. Recreate the original column: create_column_from(table, temp_column, column, ... )
3. install_rename_triggers(table, column, temp_column)
In a perfect world, we would wrap all those three steps in a transaction and sleep tight that nothing can happen.
But, unfortunately, create_column_from
can not run inside a transaction, which means that there is a risk that if it fails while rolling back the migration (in step 2 above), then we can leave the schema in an inconsistent state without the original column and only the temporary column available.
I can not see how we could wrap the operations above in a transaction, so, if there is not something I have not thought about, we have a choice:
-
Close this MR and forget about the helpers introduced in this MR
-
Bad: Our existing implementation has no way to go back to the initial column type if there is an error while rolling forward (i.e. the second migration fails and is never applied)
-
Semi-Good: as far as I can see, it always leaves the schema with the original column defined (even with a different type)
This is not always 100% what we want and it could introduce problems on its own as the new data type may not work correctly with the old code that assumes a different type. (that was the reason I started working in this MR in the first place)
-
-
Introduce and start using the helpers in this MR
-
Good: If there is an error while rolling forward, you can always rollback the first migration and go back to where you started
-
Bad: If we have to rollback after both migrations run then we risk the edge case that the rollback can fail and leave the schema in a super inconsistent state. You can't do worse than a missing column!
But we normally don't rollback migrations that succeed, so this is a truly edge case that must be accompanied by an additional failure in the rollback to end up in that case.
-
I am not sure which is the worse scenario, so I am going to leave this open for debate in the comment section of this MR.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team