Use new Sanitizable concern in User (!68508) · Merge requests · GitLab.org / GitLab

Philip Cunningham requested to merge philipcunningham-use-santizable-in-other-models-334653 into master Aug 19, 2021

What does this MR do?

this merge request leverages the new Sanitizable concern introduced in Add ability to easily sanitize attributes and replaces adhoc sanitizing in different parts of the codebase. the intention is to help prevent xss attacks in the event of a by-pass in the frontend sanitizer due to a configuration issue or a vulnerability in the sanitizer. this approach is commonly referred to as defense-in-depth.

Implications

this means that in addition to the existing behaviour escaped html entities (e.g. &) will no longer be permitted and will result in a validation error. the reasons behind this are discussed in Add ability to easily sanitize attributes.

Investigation

`linkedin`

usernames can only contain alphanumeric characters .

Your custom URL can be 3-100 characters long. Don't use spaces, symbols, or special characters

https://www.linkedin.com/help/linkedin/answer/87/customize-your-public-profile-url

analysis

46_330 non-alphanumeric twitter usernames on gitlab.com.

select count(*) from users where (users.linkedin <> '') and users.linkedin not like '%[a-zA-Z0-9 .-]%';

 Aggregate  (cost=2342841.30..2342841.31 rows=1 width=8) (actual time=4778.026..4802.490 rows=1 loops=1)
   Buffers: shared hit=36362 read=534443
   I/O Timings: read=11079.894 write=0.000
   ->  Gather  (cost=2342841.09..2342841.30 rows=2 width=8) (actual time=4777.849..4802.439 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=36362 read=534443
         I/O Timings: read=11079.894 write=0.000
         ->  Aggregate  (cost=2341841.09..2341841.10 rows=1 width=8) (actual time=4771.986..4771.990 rows=1 loops=3)
               Buffers: shared hit=36362 read=534443
               I/O Timings: read=11079.894 write=0.000
               ->  Parallel Seq Scan on public.users  (cost=0.00..2341699.08 rows=56802 width=0) (actual time=1.646..4762.105 rows=46330 loops=3)
                     Filter: (((users.linkedin)::text <> ''::text) AND ((users.linkedin)::text !~~ '%[a-zA-Z0-9 .-]%'::text))
                     Rows Removed by Filter: 3072957
                     Buffers: shared hit=36362 read=534443
                     I/O Timings: read=11079.894 write=0.000

`skype`

new usernames appear to be computer generated (e.g. live:.cid.0000000000000000) but were user-defined in the past. when they were user defined some specicial characters were allowed but none of were html entities.

A Skype username cannot be shorter than six characters or longer than 32. It can contain both letters and numbers, but must start with a letter; accented characters are not allowed. The only punctuation marks you can use are commas, dashes, periods and underscores.

https://itstillworks.com/skype-username-tips-22188.html

analysis

29_962 non-alphanumeric twitter usernames on gitlab.com.

select count(*) from users where (users.skype <> '') and users.skype not like '%[a-zA-Z0-9 .-]%';

 Aggregate  (cost=2342791.40..2342791.41 rows=1 width=8) (actual time=5193.205..5220.942 rows=1 loops=1)
   Buffers: shared hit=36319 read=534545
   I/O Timings: read=12142.436 write=0.000
   ->  Gather  (cost=2342791.19..2342791.40 rows=2 width=8) (actual time=5192.683..5220.923 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=36319 read=534545
         I/O Timings: read=12142.436 write=0.000
         ->  Aggregate  (cost=2341791.19..2341791.20 rows=1 width=8) (actual time=5186.033..5186.035 rows=1 loops=3)
               Buffers: shared hit=36319 read=534545
               I/O Timings: read=12142.436 write=0.000
               ->  Parallel Seq Scan on public.users  (cost=0.00..2341699.08 rows=36842 width=0) (actual time=1.525..5178.115 rows=29962 loops=3)
                     Filter: (((users.skype)::text <> ''::text) AND ((users.skype)::text !~~ '%[a-zA-Z0-9 .-]%'::text))
                     Rows Removed by Filter: 3089325
                     Buffers: shared hit=36319 read=534545
                     I/O Timings: read=12142.436 write=0.000

`twitter`

usernames can only contain alphanumeric characters and underscores.

A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.

https://help.twitter.com/en/managing-your-account/twitter-username-rules

analysis

50_633 non-alphanumeric twitter usernames on gitlab.com.

select count(*) from users where (users.twitter <> '') and users.twitter not like '%[a-zA-Z0-9 .-]%';

 Aggregate  (cost=2342855.95..2342855.96 rows=1 width=8) (actual time=4822.207..4850.010 rows=1 loops=1)
   Buffers: shared hit=36170 read=534635 dirtied=9588 written=9573
   I/O Timings: read=10925.721 write=201.099
   ->  Gather  (cost=2342855.74..2342855.95 rows=2 width=8) (actual time=4821.739..4850.000 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=36170 read=534635 dirtied=9588 written=9573
         I/O Timings: read=10925.721 write=201.099
         ->  Aggregate  (cost=2341855.74..2341855.75 rows=1 width=8) (actual time=4815.522..4815.524 rows=1 loops=3)
               Buffers: shared hit=36170 read=534635 dirtied=9588 written=9573
               I/O Timings: read=10925.721 write=201.099
               ->  Parallel Seq Scan on public.users  (cost=0.00..2341699.08 rows=62663 width=0) (actual time=0.956..4805.501 rows=50633 loops=3)
                     Filter: (((users.twitter)::text <> ''::text) AND ((users.twitter)::text !~~ '%[a-zA-Z0-9 .-]%'::text))
                     Rows Removed by Filter: 3068654
                     Buffers: shared hit=36170 read=534635 dirtied=9588 written=9573
                     I/O Timings: read=10925.721 write=201.099

Does this MR meet the acceptance criteria?

Conformity

I have included changelog trailers, or none are needed. (Does this MR need a changelog?)
I have added/updated documentation, or it's not needed. (Is documentation required?)
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?)
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?)
I have self-reviewed this MR per code review guidelines.
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines)
I have followed the style guides.
This change is backwards compatible across updates, or this does not apply.

Availability and Testing

I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.)
I have tested this MR in all supported browsers, or it's not needed.
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited Sep 09, 2021 by Philip Cunningham

Use new Sanitizable concern in User

What does this MR do?

Implications

Investigation

linkedin

analysis

skype

analysis

twitter

analysis

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Merge request reports

`linkedin`

`skype`

`twitter`