Skip to content

Add limits for import placeholder user creation

Luke Duncalfe requested to merge 455903-limits into master

What does this MR do and why?

During the improved user mapping, placeholder users (Users with user_type of "placeholder") are created during an import and assigned as the users of the imported content.

We limit the number of users records that can be created during an import by imposing limits to the number of placeholder users that can be created. These limits apply to a root namespace, and take into account plan and seat count, with more generous limits for higher paid plans and larger number of paid seats.

When the limit is reached, no more placeholder users are created for the root namespace. Instead, the single "importer user" record that is associated with their root namespace is used instead.

This MR builds on the limits migrated for GitLab.com in !162099 (merged). For other GitLab instances, the limits all default to 0, which means "no limit".

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

  1. Temporarily allow your localhost to use paid plans by applying this patch:

    diff --git a/ee/app/models/ee/namespace.rb b/ee/app/models/ee/namespace.rb
    index 61c2f28b3bde..a4c8a1418dea 100644
    --- a/ee/app/models/ee/namespace.rb
    +++ b/ee/app/models/ee/namespace.rb
    @@ -271,7 +271,7 @@ def feature_available_non_trial?(feature)
         override :actual_plan
         def actual_plan
           ::Gitlab::SafeRequestStore.fetch(actual_plan_store_key) do
    -        next ::Plan.default unless ::Gitlab.com?
    +        # next ::Plan.default unless ::Gitlab.com?
    
             if parent_id
               root_ancestor.actual_plan
  2. Choose a root namespace and generate a new paid plan for it:

    root_namespace = Group.find_by_full_path(<full_path>).root_ancestor
    
    root_namespace.create_gitlab_subscription(
      plan_code: Plan::PREMIUM,
      trial: false,
      start_date: Time.now,
      seats: 1
    )
    
    # All going well, this should be true:
    root_namespace.reload.actual_plan.paid? # => true
  3. Set these low limits in your plan limits for testing:

    limits = root_namespace.reload.actual_plan.actual_limits
    limits.update!(
      import_placeholder_user_limit_tier_1: 1,
      import_placeholder_user_limit_tier_2: 2,
      import_placeholder_user_limit_tier_3: 3,
      import_placeholder_user_limit_tier_4: 4,
     )
  4. In a Rails console:

    # Delete any previous `Import::SourceUser` records for this namespace, as it will mess up the QA:
    Import::SourceUser.for_namespace(root_namespace).delete_all
    
    # Assign a lambda to wrap calling the SourceUserMapper service
    create_source_user = lambda do 
      user_mapper = Gitlab::Import::SourceUserMapper.new(namespace: root_namespace, import_type: 'github', source_hostname: Gitlab.host_with_port)
      user_mapper.find_or_create_source_user(source_name: nil, source_username: nil, source_user_identifier: SecureRandom.hex) 
    end
    
    # Create a new `Import::SourceUser` record:
    source_user = create_source_user.call
    source_user.placeholder_user.placeholder? # => Should be true
    
    # Create a second record, this one should have reached the limit and use the "import user" instead.
    # Once the limit has been reached, these calls should not make any SQL queries for 1 minute due to caching.
    source_user = create_source_user.call
    source_user.placeholder_user.placeholder? # => Should be false
    source_user.placeholder_user.import_user? # => Should be true
    
    # Raise the seat count of your subscription to 550 (to jump up to the `import_placeholder_user_limit_tier_3` limit).
    root_namespace.gitlab_subscription.update!(seats: 550)
    
    # Clear the cache of the limit (which expires after an hour).
    limit = Import::PlaceholderUserLimit.new(namespace: root_namespace)
    limit.send(:cache).del(limit.send(:limit_cache_key))
    
    # You should be able to create `3` source users with placeholder users before hitting the limit:
    create_source_user.call.placeholder_user.import_user? # => false
    create_source_user.call.placeholder_user.import_user? # => false
    create_source_user.call.placeholder_user.import_user? # => true
  5. Clean up:

    1. Reset the data:
    limits = root_namespace.actual_plan.actual_limits
    limits.update!(
      import_placeholder_user_limit_tier_1: 0,
      import_placeholder_user_limit_tier_2: 0,
      import_placeholder_user_limit_tier_3: 0,
      import_placeholder_user_limit_tier_4: 0,
     )
     
    root_namespace.gitlab_subscription.destroy!
    1. Undo the patch you applied.

SQL Plans

Ruby method (using the maximum limit number that will be used, the lowest limit would be 400):

Import::SourceUser.namespace_placeholder_user_count(namespace, limit: 8000)

Issues 2 queries.

Raw SQL Query 1:

SELECT
   COUNT(count_column) 
FROM
   (
      SELECT DISTINCT
         "import_source_users"."placeholder_user_id" AS count_column 
      FROM
         "import_source_users" 
      WHERE
         "import_source_users"."namespace_id" = 9970 LIMIT 8000
   )
   subquery_for_count

Query 1 plan:

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/30938/commands/96103

 Aggregate  (cost=712.75..712.76 rows=1 width=8) (actual time=6.615..6.617 rows=1 loops=1)
   Buffers: shared hit=136
   I/O Timings: read=0.000 write=0.000
   ->  Limit  (cost=532.75..612.75 rows=8000 width=8) (actual time=4.541..6.173 rows=8000 loops=1)
         Buffers: shared hit=136
         I/O Timings: read=0.000 write=0.000
         ->  HashAggregate  (cost=532.75..612.75 rows=8000 width=8) (actual time=4.538..5.630 rows=8000 loops=1)
               Group Key: import_source_users.placeholder_user_id
               Buffers: shared hit=136
               I/O Timings: read=0.000 write=0.000
               ->  Index Scan using index_import_source_users_on_namespace_id_and_status on public.import_source_users  (cost=0.28..512.75 rows=8000 width=8) (actual time=0.067..2.342 rows=8000 loops=1)
                     Index Cond: (import_source_users.namespace_id = 9970)
                     Buffers: shared hit=136
                     I/O Timings: read=0.000 write=0.000

Raw SQL Query 2:

SELECT
   "namespace_import_users".* 
FROM
   "namespace_import_users" 
WHERE
   "namespace_import_users"."namespace_id" = 9970 LIMIT 1

Query 2 plan:

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/30938/commands/96106

 Limit  (cost=0.14..3.16 rows=1 width=24) (actual time=0.057..0.058 rows=1 loops=1)
   Buffers: shared hit=5
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using index_namespace_import_users_on_namespace_id on public.namespace_import_users  (cost=0.14..3.16 rows=1 width=24) (actual time=0.056..0.056 rows=1 loops=1)
         Index Cond: (namespace_import_users.namespace_id = 9970)
         Buffers: shared hit=5
         I/O Timings: read=0.000 write=0.000

Related to #455903 (closed)

Edited by Luke Duncalfe

Merge request reports

Loading