Schema inconsistencies when rolling back a column rename or type change for columns that are NOT NULL
With !44025 (merged) we are fixing create_column_from
to also properly copy any check constraints defined for the origin column.
But there is an additional detail on how create_column_from
works that is not easily addressable:
-
For
create_column_from
to be agnostic on the size of the table that it is used for, we have to switchNOT NULL
constraints tocolumn IS NOT NULL
check constraints. -
The reason for that is that we want to avoid the access exclusive locks introduced by
ALTER COLUMN ... SET NOT NULL
Such a command can lock a large table for up to 10 minutes before it completes
But that introduces a slight issue with migrations which use rename_column_concurrently
, undo_cleanup_concurrent_column_rename
and change_column_type_concurrently
that directly or indirectly call create_column_from
.
As long as everything works without issues, we have all angles covered (given that some newly found problems will be soon addressed with !44025 (merged) and #247496 (closed))
But if those migrations (1) are over a column with a NOT NULL constraint and (2) they are rolled back, then we can have schema inconsistencies:
- As explained above, when the new column is created, the
NOT NULL
constraint is converted to acolumn IS NOT NULL
check constraint for the new column - As long as the
cleanup_XXX
migration has not run we are OK as the rollback just drops the new column - But if there is a fail after the
cleanup_XXX
migration runs (which switches the new column with the old column and drops the old column), we loose track of the old column's definition while the new one is slightly different. - If we rollback, we do the switch again and finally we have the old column back, but with check constraint defined for it instead of a
NOT NULL
constraint. - That means that the schema for the environment that the rollback happened, will be slightly different than the one defined through
structure.sql
We have a few options ranging from taking care of this problem manually if it ever appears to modifying all the affected migration helpers in a way that they will be able to address this issue somehow.
An idea by @pbair that would allow us to address this issue for most of the trivial cases is to set a NOT NULL
and a default when we create the new column (which in PostgreSQL 11+ comes with no additional cost) and then drop the default at the end (if it is not already defined). That means that we would have to figure out defaults for all the supported types and then make sure that everything works without issues, but even without full coverage, it is a good starting point that should address most of the use cases.