Identify users that are subject to PIPL
- Implements https://gitlab.com/gitlab-org/gitlab/-/work_items/468861, and https://gitlab.com/gitlab-org/gitlab/-/work_items/468862
- Depends on !157480 (merged)
Context
See https://gitlab.com/groups/gitlab-org/-/epics/12600.
What does this MR do and why?
For each user accessing GitLab.com, we track and count the access if it is coming from a PIPL-covered country (done by checking HTTP_CF_IPCOUNTRY
request env var). If the user then accesses from a non PIPL-covered country, their existing access log record is updated to reset its access_count
value. This tracking is limited to once per qualified user every 24 hours.
Further, if the user is determined to be exclusively accessing from PIPL-covered countries (met_pipl_access_threshold?
), a background job is queued to check if the user belongs to any paid namespace. This check is necessary because users belonging to paid namespaces are exempt from actions taken to ensure PIPL compliance.
What is not in this MR?
This MR only includes changes necessary to determine if a user is subject to PIPL. Separate MRs will be opened using the functionality added in this MR to notify, block, and eventually delete target users (see https://gitlab.com/gitlab-org/gitlab/-/issues/455706#note_1988620480).
Database changes
- Add
last_access_from_pipl_country_at
column tousers
- see !157579 (comment 1984998142) for reason why this addition is necessary
Scopes
Users::CountryAccessLogs.from_country_code
Raw SQL
SELECT
"country_access_logs".*
FROM
"country_access_logs"
WHERE
"country_access_logs"."user_id" = 1
AND "country_access_logs"."country_code" IN (0, 1, 2);
Explain
https://console.postgres.ai/shared/a84bc891-5a5e-4ffd-b15c-d92729e49150
Index Scan using index_country_access_logs_on_user_id_and_country_code on public.country_access_logs (cost=0.15..4.99 rows=1 width=62) (actual time=0.045..0.048 rows=3 loops=1)
Index Cond: ((country_access_logs.user_id = 1) AND (country_access_logs.country_code = ANY ('{0,1,2}'::integer[])))
Buffers: shared hit=13
I/O Timings: read=0.000 write=0.000
Users::CountryAccessLogs.with_access
Raw SQL
SELECT
"country_access_logs".*
FROM
"country_access_logs"
WHERE
"country_access_logs"."user_id" = 1
AND (access_count > 0)
Explain
https://console.postgres.ai/shared/4b1c4f90-6d58-413c-9be8-1e4dc348c4ee
Index Scan using index_country_access_logs_on_user_id_and_country_code on public.country_access_logs (cost=0.15..7.73 rows=2 width=62) (actual time=0.070..0.071 rows=1 loops=1)
Index Cond: (country_access_logs.user_id = 1)
Filter: (country_access_logs.access_count > 0)
Rows Removed by Filter: 2
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
Users::CountryAccessLogs.first_access_before
Raw SQL
SELECT
"country_access_logs".*
FROM
"country_access_logs"
WHERE
"country_access_logs"."user_id" = 1
AND (first_access_at <= '2024-01-09 09:41:16.188546')
Explain
https://console.postgres.ai/shared/b5eaf040-1552-44dd-a62d-28d0f6c414eb
Index Scan using index_country_access_logs_on_user_id_and_country_code on public.country_access_logs (cost=0.15..7.73 rows=2 width=62) (actual time=0.053..0.054 rows=1 loops=1)
Index Cond: (country_access_logs.user_id = 1)
Filter: (country_access_logs.first_access_at <= '2024-01-09 09:41:16.188546+00'::timestamp with time zone)
Rows Removed by Filter: 2
Buffers: shared hit=5
I/O Timings: read=0.000 write=0.000
user.country_access_logs.from_country_code(COVERED_COUNTRY_CODES).with_access
Raw SQL
EXPLAIN
SELECT
"country_access_logs".*
FROM
"country_access_logs"
WHERE
"country_access_logs"."user_id" = 1
AND "country_access_logs"."country_code" IN (0, 1, 2)
AND (access_count > 0);
Explain
https://console.postgres.ai/shared/88473649-b4af-41bb-8a48-6bd04ee449e8
Index Scan using index_country_access_logs_on_user_id_and_country_code on public.country_access_logs (cost=0.15..4.99 rows=1 width=62) (actual time=0.068..0.069 rows=1 loops=1)
Index Cond: ((country_access_logs.user_id = 1) AND (country_access_logs.country_code = ANY ('{0,1,2}'::integer[])))
Filter: (country_access_logs.access_count > 0)
Rows Removed by Filter: 2
Buffers: shared hit=13
I/O Timings: read=0.000 write=0.000
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Note: the validation steps below only verifies one part of the functionality introduced in this MR. Integration tests (see ee/spec/requests/root_controller_spec.rb
) were added to ensure all modules introduced in this MR are working correctly.
-
Start GDK simulating SaaS
$ export GITLAB_SIMULATE_SAAS=1 $ gdk start
-
Enable required FFs
$ rails c > Feature.enable(:track_user_access_from_pipl_countries)
-
Hard-code
request.env['HTTP_CF_IPCOUNTRY']
value inee/app/controllers/concerns/users/track_country_access.rb
def track_country_access # Compliance::Pipl::TrackUserCountryAccessService.new(current_user, request.env['HTTP_CF_IPCOUNTRY']).execute Compliance::Pipl::TrackUserCountryAccessService.new(current_user, 'CN').execute end
-
Login with
root
and load the root path (localhost:3000/
) -
At this point an access should have been tracked for the
root
user. In Rails console, verify this by checkingroot
user'scountry_access_logs
$ rails c > User.first.country_access_logs => [#<Users::CountryAccessLog:0x000000016fd13b78 id: 1, created_at: Tue, 02 Jul 2024 10:32:55.219161000 UTC +00:00, updated_at: Tue, 02 Jul 2024 10:55:24.476584000 UTC +00:00, access_count_reset_at: nil, first_access_at: Tue, 02 Jul 2024 10:32:55.231767000 UTC +00:00, last_access_at: Tue, 02 Jul 2024 10:55:24.453392000 UTC +00:00, user_id: 1, access_count: 1, country_code: "CN">]