Use new Sanitizable concern in User
What does this MR do?
this merge request leverages the new Sanitizable
concern introduced in Add ability to easily sanitize attributes
and replaces adhoc sanitizing in different parts of the codebase. the intention is to help prevent xss attacks in the event of a by-pass in the frontend sanitizer due to a configuration issue or a vulnerability in the sanitizer. this approach is commonly referred to as defense-in-depth.
Implications
this means that in addition to the existing behaviour escaped html
entities (e.g. &
) will no longer be permitted and will result in a validation error. the reasons behind this are discussed in Add ability to easily sanitize attributes
.
Investigation
linkedin
usernames can only contain alphanumeric characters .
Your custom URL can be 3-100 characters long. Don't use spaces, symbols, or special characters
https://www.linkedin.com/help/linkedin/answer/87/customize-your-public-profile-url
analysis
46_330
non-alphanumeric twitter usernames on gitlab.com.
select count(*) from users where (users.linkedin <> '') and users.linkedin not like '%[a-zA-Z0-9 .-]%';
Aggregate (cost=2342841.30..2342841.31 rows=1 width=8) (actual time=4778.026..4802.490 rows=1 loops=1)
Buffers: shared hit=36362 read=534443
I/O Timings: read=11079.894 write=0.000
-> Gather (cost=2342841.09..2342841.30 rows=2 width=8) (actual time=4777.849..4802.439 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=36362 read=534443
I/O Timings: read=11079.894 write=0.000
-> Aggregate (cost=2341841.09..2341841.10 rows=1 width=8) (actual time=4771.986..4771.990 rows=1 loops=3)
Buffers: shared hit=36362 read=534443
I/O Timings: read=11079.894 write=0.000
-> Parallel Seq Scan on public.users (cost=0.00..2341699.08 rows=56802 width=0) (actual time=1.646..4762.105 rows=46330 loops=3)
Filter: (((users.linkedin)::text <> ''::text) AND ((users.linkedin)::text !~~ '%[a-zA-Z0-9 .-]%'::text))
Rows Removed by Filter: 3072957
Buffers: shared hit=36362 read=534443
I/O Timings: read=11079.894 write=0.000
skype
new usernames appear to be computer generated (e.g. live:.cid.0000000000000000
) but were user-defined in the past. when they were user defined some specicial characters were allowed but none of were html
entities.
A Skype username cannot be shorter than six characters or longer than 32. It can contain both letters and numbers, but must start with a letter; accented characters are not allowed. The only punctuation marks you can use are commas, dashes, periods and underscores.
https://itstillworks.com/skype-username-tips-22188.html
analysis
29_962
non-alphanumeric twitter usernames on gitlab.com.
select count(*) from users where (users.skype <> '') and users.skype not like '%[a-zA-Z0-9 .-]%';
Aggregate (cost=2342791.40..2342791.41 rows=1 width=8) (actual time=5193.205..5220.942 rows=1 loops=1)
Buffers: shared hit=36319 read=534545
I/O Timings: read=12142.436 write=0.000
-> Gather (cost=2342791.19..2342791.40 rows=2 width=8) (actual time=5192.683..5220.923 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=36319 read=534545
I/O Timings: read=12142.436 write=0.000
-> Aggregate (cost=2341791.19..2341791.20 rows=1 width=8) (actual time=5186.033..5186.035 rows=1 loops=3)
Buffers: shared hit=36319 read=534545
I/O Timings: read=12142.436 write=0.000
-> Parallel Seq Scan on public.users (cost=0.00..2341699.08 rows=36842 width=0) (actual time=1.525..5178.115 rows=29962 loops=3)
Filter: (((users.skype)::text <> ''::text) AND ((users.skype)::text !~~ '%[a-zA-Z0-9 .-]%'::text))
Rows Removed by Filter: 3089325
Buffers: shared hit=36319 read=534545
I/O Timings: read=12142.436 write=0.000
twitter
usernames can only contain alphanumeric characters and underscores.
A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.
https://help.twitter.com/en/managing-your-account/twitter-username-rules
analysis
50_633
non-alphanumeric twitter usernames on gitlab.com.
select count(*) from users where (users.twitter <> '') and users.twitter not like '%[a-zA-Z0-9 .-]%';
Aggregate (cost=2342855.95..2342855.96 rows=1 width=8) (actual time=4822.207..4850.010 rows=1 loops=1)
Buffers: shared hit=36170 read=534635 dirtied=9588 written=9573
I/O Timings: read=10925.721 write=201.099
-> Gather (cost=2342855.74..2342855.95 rows=2 width=8) (actual time=4821.739..4850.000 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=36170 read=534635 dirtied=9588 written=9573
I/O Timings: read=10925.721 write=201.099
-> Aggregate (cost=2341855.74..2341855.75 rows=1 width=8) (actual time=4815.522..4815.524 rows=1 loops=3)
Buffers: shared hit=36170 read=534635 dirtied=9588 written=9573
I/O Timings: read=10925.721 write=201.099
-> Parallel Seq Scan on public.users (cost=0.00..2341699.08 rows=62663 width=0) (actual time=0.956..4805.501 rows=50633 loops=3)
Filter: (((users.twitter)::text <> ''::text) AND ((users.twitter)::text !~~ '%[a-zA-Z0-9 .-]%'::text))
Rows Removed by Filter: 3068654
Buffers: shared hit=36170 read=534635 dirtied=9588 written=9573
I/O Timings: read=10925.721 write=201.099
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team