Skip to content

Block banned user normalized email reuse

Eugie Limpin requested to merge el-block-banned-user-detumbled-email-reuse into master

What does this MR do and why?

Implements https://gitlab.com/gitlab-org/modelops/anti-abuse/team-tasks/-/issues/815+.

This MR implements blocking of sign-ups that reuse a banned user's normalized email address. For now, blocking will be limited to gmail email addresses because we currently only normalize gmail email addresses (see https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/users/update_canonical_email_service.rb#L7).

Normalized in this context means a gmail email address with removed . characters and +<anything> part up to @gmail.com.

Database changes

Users::BannedUser.by_canonical_email scope

Raw SQL
SELECT
    1 AS one
FROM
    "banned_users"
    INNER JOIN "user_canonical_emails" ON "user_canonical_emails"."user_id" = "banned_users"."user_id"
WHERE
    "user_canonical_emails"."canonical_email" = 'email_of_banned_user@gmail.com'
LIMIT 1
Explain

https://console.postgres.ai/shared/f39d44df-5e29-4534-bc2d-79c464a4b98e

Limit  (cost=0.98..7.01 rows=1 width=4) (actual time=15.837..15.840 rows=1 loops=1)
   Buffers: shared hit=18 read=15
   I/O Timings: read=15.618 write=0.000
   ->  Nested Loop  (cost=0.98..7.01 rows=1 width=4) (actual time=15.835..15.836 rows=1 loops=1)
         Buffers: shared hit=18 read=15
         I/O Timings: read=15.618 write=0.000
         ->  Index Scan using index_user_canonical_emails_on_canonical_email on public.user_canonical_emails  (cost=0.56..3.58 rows=1 width=8) (actual time=8.730..13.777 rows=7 loops=1)
               Index Cond: ((user_canonical_emails.canonical_email)::text = 'james73290@gmail.com'::text)
               Buffers: shared read=11
               I/O Timings: read=13.675 write=0.000
         ->  Index Only Scan using banned_users_pkey on public.banned_users  (cost=0.42..3.44 rows=1 width=8) (actual time=0.290..0.290 rows=0 loops=7)
               Index Cond: (banned_users.user_id = user_canonical_emails.user_id)
               Heap Fetches: 0
               Buffers: shared hit=18 read=4
               I/O Timings: read=1.942 write=0.000

Current worse case scenario

In https://gitlab.com/gitlab-org/modelops/anti-abuse/team-tasks/-/issues/810+, an analysis of current detumbled email usage was performed by fetching the most reused normalized email associated to a banned/blocked user. An EXPLAIN of the scope query using the current most reused email can be seen in: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/30558/commands/94669

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screen_Recording_2024-08-01_at_4.17.36_PM

Before After

How to set up and validate locally

  1. Register a new user using a gmail email. Take note of the email you used
  2. In Rails console, ban the newly created user and enable the feature flag
    > rails c
    > User.last.ban!
    > Feature.enable(:block_banned_user_normalized_email_reuse)
  3. Go to the signup page again and try to register a new user with a tumbled version of the banned user's email. For example, if you used my_user@gmail.com in step 1, you can use my_user+letmereuse@gmail.com
  4. Verify that the registration is blocked and the Please use another email error is displayed
Edited by Eugie Limpin

Merge request reports

Loading