Migrate users.bio to user_details.bio
Feature Flag
Feature flag (default enabled): migrate_bio_to_user_details
.
In case of an issue: Turn off the feature flag off globally, this stops the attribute syncs and the background migrations will be no-op.
What does this MR do?
- Part of: #206913 (closed)
- Unblocks: !20254 (closed)
This MR is the first part of the migration of users.bio
to user_details.bio
.
Milestone N (current):
- Add
bio
column touser_details
. - Kick off background migration jobs to copy
users.bio
touser_details.bio
for existing users. - Update the application code to assign the value of
users.bio
touser_details.bio
as well.
Milestone N + 1 (next):
- Ignore the
users.bio
column. - Always delegate
bio
touser_details.bio
. - Preload
user_details
on/users
API to avoid N+1.
Milestone N + 2:
- Remove
users.bio
column.
Migration output
Migration output.
== 20200323071918 AddBioToUserDetails: migrating ============================== -- transaction_open?() -> 0.0000s -- execute("SET statement_timeout TO 0") -> 0.0003s -- transaction() -- add_column(:user_details, :bio, :string, {:default=>nil, :limit=>255}) -> 0.0011s -- change_column_default(:user_details, :bio, "") -> 0.0021s -> 0.0047s -- columns(:user_details) -> 0.0011s -- transaction_open?() -> 0.0000s -- exec_query("SELECT COUNT(*) AS count FROM \"user_details\"") -> 0.0007s -- exec_query("SELECT \"user_details\".\"user_id\" FROM \"user_details\" ORDER BY \"user_details\".\"user_id\" ASC LIMIT 1") -> 0.0011s -- exec_query("SELECT \"user_details\".\"user_id\" FROM \"user_details\" WHERE \"user_details\".\"user_id\" >= 1 ORDER BY \"user_details\".\"user_id\" ASC LIMIT 1 OFFSET 1") -> 0.0009s -- execute("UPDATE \"user_details\" SET \"bio\" = '' WHERE \"user_details\".\"user_id\" >= 1 AND \"user_details\".\"user_id\" < 78") -> 0.0016s -- exec_query("SELECT \"user_details\".\"user_id\" FROM \"user_details\" WHERE \"user_details\".\"user_id\" >= 78 ORDER BY \"user_details\".\"user_id\" ASC LIMIT 1 OFFSET 1") -> 0.0003s -- execute("UPDATE \"user_details\" SET \"bio\" = '' WHERE \"user_details\".\"user_id\" >= 78") -> 0.0011s -- change_column_null(:user_details, :bio, false) -> 0.0009s -- execute("RESET ALL") -> 0.0002s == 20200323071918 AddBioToUserDetails: migrated (0.0149s) ======================= 20200323074147 AddTempIndexOnUsersBio: migrating =========================== -- transaction_open?() -> 0.0000s -- index_exists?(:users, :id, {:where=>"(COALESCE(bio, '') IS DISTINCT FROM '')", :name=>"tmp_idx_on_user_id_where_bio_is_filled", :algorithm=>:concurrently}) -> 0.0061s -- execute("SET statement_timeout TO 0") -> 0.0001s -- add_index(:users, :id, {:where=>"(COALESCE(bio, '') IS DISTINCT FROM '')", :name=>"tmp_idx_on_user_id_where_bio_is_filled", :algorithm=>:concurrently}) -> 0.0134s -- execute("RESET ALL") -> 0.0002s == 20200323074147 AddTempIndexOnUsersBio: migrated (0.0200s) ==================
== 20200323080714 TriggerBackgroundMigrationForUsersBio: migrating ============ == 20200323080714 TriggerBackgroundMigrationForUsersBio: migrated (0.0606s) ===
BG migration queries
Each bath query for scheduling, query 1 (< 1ms)
SELECT "users"."id" FROM "users" WHERE ((COALESCE(bio, '') IS DISTINCT FROM '')) ORDER BY "users"."id" ASC LIMIT 1
Limit (cost=0.42..0.43 rows=1 width=4) (actual time=0.253..0.253 rows=1 loops=1) Buffers: shared hit=1 read=3 I/O Timings: read=0.151 -> Index Only Scan using tmp_idx_on_user_id_where_bio_is_filled on public.users (cost=0.42..58684.85 rows=5391511 width=4) (actual time=0.251..0.251 rows=1 loops=1) Heap Fetches: 0 Buffers: shared hit=1 read=3 I/O Timings: read=0.151
Each bath query for scheduling, query 2 (< 5ms)
SELECT "users"."id" FROM "users" WHERE ((COALESCE(bio, '') IS DISTINCT FROM '')) AND "users"."id" >= 78 ORDER BY "users"."id" ASC LIMIT 1 OFFSET 1000
Limit (cost=11.37..11.39 rows=1 width=4) (actual time=2.801..2.801 rows=1 loops=1) Buffers: shared hit=846 read=9 dirtied=5 I/O Timings: read=2.139 -> Index Only Scan using tmp_idx_on_user_id_where_bio_is_filled on public.users (cost=0.42..59061.71 rows=5391443 width=4) (actual time=0.038..2.738 rows=1001 loops=1) Index Cond: (users.id >= 78) Heap Fetches: 7 Buffers: shared hit=846 read=9 dirtied=5 I/O Timings: read=2.139
Upsert query (< 1s)
INSERT INTO user_details (user_id, bio) SELECT id AS user_id, substring(COALESCE(bio, '') from 1 for 255) AS bio FROM "users" WHERE ((COALESCE(bio, '') IS DISTINCT FROM '')) AND "users"."id" BETWEEN 35149 AND 79126 ON CONFLICT (user_id) DO UPDATE SET "bio" = EXCLUDED."bio";
ModifyTable on public.user_details (cost=0.42..78690.51 rows=42534 width=458) (actual time=752.533..752.533 rows=0 loops=1) Buffers: shared hit=10090 read=1006 dirtied=17 I/O Timings: read=709.175 -> Index Scan using tmp_idx_on_user_id_where_bio_is_filled on public.users (cost=0.42..78690.51 rows=42534 width=458) (actual time=0.046..720.882 rows=1002 loops=1) Index Cond: ((users.id >= 35149) AND (users.id <= 79126)) Buffers: shared hit=7 read=996 dirtied=7 I/O Timings: read=709.175
Note: This is with batch size 1000. I modified it to 500 just to be safe. Migration will take about 10 hours.
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Closes #206913 (closed)