Add limits for import placeholder user creation
What does this MR do and why?
During the improved user mapping, placeholder users (User
s with user_type
of "placeholder"
) are created during
an import and assigned as the users of the imported content.
We limit the number of users
records that can be created during an
import by imposing limits to the number of placeholder users that can
be created. These limits apply to a root namespace, and take into account
plan and seat count, with more generous limits for higher paid plans and
larger number of paid seats.
When the limit is reached, no more placeholder users are created for the root namespace. Instead, the single "importer user" record that is associated with their root namespace is used instead.
This MR builds on the limits migrated for GitLab.com in !162099 (merged). For other GitLab instances, the limits all default to 0
, which means "no limit".
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
-
Temporarily allow your localhost to use paid plans by applying this patch:
diff --git a/ee/app/models/ee/namespace.rb b/ee/app/models/ee/namespace.rb index 61c2f28b3bde..a4c8a1418dea 100644 --- a/ee/app/models/ee/namespace.rb +++ b/ee/app/models/ee/namespace.rb @@ -271,7 +271,7 @@ def feature_available_non_trial?(feature) override :actual_plan def actual_plan ::Gitlab::SafeRequestStore.fetch(actual_plan_store_key) do - next ::Plan.default unless ::Gitlab.com? + # next ::Plan.default unless ::Gitlab.com? if parent_id root_ancestor.actual_plan
-
Choose a root namespace and generate a new paid plan for it:
root_namespace = Group.find_by_full_path(<full_path>).root_ancestor root_namespace.create_gitlab_subscription( plan_code: Plan::PREMIUM, trial: false, start_date: Time.now, seats: 1 ) # All going well, this should be true: root_namespace.reload.actual_plan.paid? # => true
-
Set these low limits in your plan limits for testing:
limits = root_namespace.reload.actual_plan.actual_limits limits.update!( import_placeholder_user_limit_tier_1: 1, import_placeholder_user_limit_tier_2: 2, import_placeholder_user_limit_tier_3: 3, import_placeholder_user_limit_tier_4: 4, )
-
In a Rails console:
# Delete any previous `Import::SourceUser` records for this namespace, as it will mess up the QA: Import::SourceUser.for_namespace(root_namespace).delete_all # Assign a lambda to wrap calling the SourceUserMapper service create_source_user = lambda do user_mapper = Gitlab::Import::SourceUserMapper.new(namespace: root_namespace, import_type: 'github', source_hostname: Gitlab.host_with_port) user_mapper.find_or_create_source_user(source_name: nil, source_username: nil, source_user_identifier: SecureRandom.hex) end # Create a new `Import::SourceUser` record: source_user = create_source_user.call source_user.placeholder_user.placeholder? # => Should be true # Create a second record, this one should have reached the limit and use the "import user" instead. # Once the limit has been reached, these calls should not make any SQL queries for 1 minute due to caching. source_user = create_source_user.call source_user.placeholder_user.placeholder? # => Should be false source_user.placeholder_user.import_user? # => Should be true # Raise the seat count of your subscription to 550 (to jump up to the `import_placeholder_user_limit_tier_3` limit). root_namespace.gitlab_subscription.update!(seats: 550) # Clear the cache of the limit (which expires after an hour). limit = Import::PlaceholderUserLimit.new(namespace: root_namespace) limit.send(:cache).del(limit.send(:limit_cache_key)) # You should be able to create `3` source users with placeholder users before hitting the limit: create_source_user.call.placeholder_user.import_user? # => false create_source_user.call.placeholder_user.import_user? # => false create_source_user.call.placeholder_user.import_user? # => true
-
Clean up:
- Reset the data:
limits = root_namespace.actual_plan.actual_limits limits.update!( import_placeholder_user_limit_tier_1: 0, import_placeholder_user_limit_tier_2: 0, import_placeholder_user_limit_tier_3: 0, import_placeholder_user_limit_tier_4: 0, ) root_namespace.gitlab_subscription.destroy!
- Undo the patch you applied.
SQL Plans
Ruby method (using the maximum limit number that will be used, the lowest limit would be 400
):
Import::SourceUser.namespace_placeholder_user_count(namespace, limit: 8000)
Issues 2 queries.
Raw SQL Query 1:
SELECT
COUNT(count_column)
FROM
(
SELECT DISTINCT
"import_source_users"."placeholder_user_id" AS count_column
FROM
"import_source_users"
WHERE
"import_source_users"."namespace_id" = 9970 LIMIT 8000
)
subquery_for_count
Query 1 plan:
https://postgres.ai/console/gitlab/gitlab-production-main/sessions/30938/commands/96103
Aggregate (cost=712.75..712.76 rows=1 width=8) (actual time=6.615..6.617 rows=1 loops=1)
Buffers: shared hit=136
I/O Timings: read=0.000 write=0.000
-> Limit (cost=532.75..612.75 rows=8000 width=8) (actual time=4.541..6.173 rows=8000 loops=1)
Buffers: shared hit=136
I/O Timings: read=0.000 write=0.000
-> HashAggregate (cost=532.75..612.75 rows=8000 width=8) (actual time=4.538..5.630 rows=8000 loops=1)
Group Key: import_source_users.placeholder_user_id
Buffers: shared hit=136
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_import_source_users_on_namespace_id_and_status on public.import_source_users (cost=0.28..512.75 rows=8000 width=8) (actual time=0.067..2.342 rows=8000 loops=1)
Index Cond: (import_source_users.namespace_id = 9970)
Buffers: shared hit=136
I/O Timings: read=0.000 write=0.000
Raw SQL Query 2:
SELECT
"namespace_import_users".*
FROM
"namespace_import_users"
WHERE
"namespace_import_users"."namespace_id" = 9970 LIMIT 1
Query 2 plan:
https://postgres.ai/console/gitlab/gitlab-production-main/sessions/30938/commands/96106
Limit (cost=0.14..3.16 rows=1 width=24) (actual time=0.057..0.058 rows=1 loops=1)
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_namespace_import_users_on_namespace_id on public.namespace_import_users (cost=0.14..3.16 rows=1 width=24) (actual time=0.056..0.056 rows=1 loops=1)
Index Cond: (namespace_import_users.namespace_id = 9970)
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
Related to #455903 (closed)