Draft: Improve Vulnerability Tracking: Backend implementation
This MR is the GitLab backend implementation of the proof-of-concept MR: !45339 (closed) and is part of the epic https://gitlab.com/groups/gitlab-org/-/epics/4690 - Improve Vulnerability Tracking: MVP: Scope+Offset
What does this MR do?
The changes in this merge request allow multiple fingerprinting algorithms to be used to compare vulnerabilities in a migration-less manner.
This lets us develop better fingerprinting algorithms while keeping backwards compatibility with existing vulnerability fingerprints.
High-Level Details
This is done by:
- Adding a new
vulnerability_tracking_fingerprints
table that enables one-to-many relationships between findings and fingerprints - Using the
tracking
field in the security report - Depending on a post-analyzer to calculate and insert the actual fingerprint value in a pipeline (see this epic https://gitlab.com/groups/gitlab-org/-/epics/4690)
This merge request DOES NOT contain the implementation for the scope+offset calculation, it only uses the values inserted into the security report JSON during the pipeline (if they exist).
Technical Details
Security Report JSON Tracking Field (expand me)
The tracking field schema assumed in this MR has the following form:
"tracking": {
"type": "source",
"positions": [
{ "file": "path/to/file1.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file1.ext|scope1|scope2:offset" } },
{ "file": "path/to/file2.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file2.ext|scope1|scope2:offset" } },
{ "file": "path/to/file3.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file3.ext|scope1|scope2:offset" } }
]
}
The tracking
field is originally (ideally) created by an analyzer for each vulnerability. The fingerprints
subfield is added by a post-analyzer.
If the tracking
field does not exist, the behavior is to fall back to hashing data from the location
field in the same manner as before.
See #233167 (closed) and #293706 (closed) for details on the tracking field
Multiple Fingerprints Per Vulnerability (expand me)
A new table, vulnerability_tracking_fingerprints
, is added that creates a one-to-many relationship between findings and vulnerability tracking fingerprints.
Each tracking fingerprint record associated with a finding contains:
Field | Notes |
---|---|
finding_id |
|
sha |
|
track_type |
|
track_method |
|
Additionally, the code itself has priorities set for each tracking method. This is NOT set by the analyzer and is used to compare tracking fingerprints between vulnerabilities in descending priority. The priority is intended to be a relative accuracy rating. Higher priority tracking fingerprints should be more accurate than lower priority tracking fingerprints.
Comparing Vulnerabilities with Fingerprint Priorities (expand me)
Each tracking fingerprint method has a priority defined in the GitLab backend. Vulnerabilities with matching tracking fingerprints are compared in descending order.
One Matching Lower-Priority Fingerprint
For example, Vuln1
's Source:Hash
tracking fingerprint below is the only match with any of Vuln2
's tracking fingerprints. Since this is the first (and only) match, they are considered the same vulnerability.
graph LR
subgraph Vuln1
subgraph fingerprint11[fingerprint1]
type11["Type:Source"]
method11["Method:Scope+Offset"]
value11["Hash:AAAA"]
end
subgraph fingerprint12[fingerprint2]
type12["Type:Source"]
method12["Method:Hash"]
value12["Hash:XXXX"]
end
end
subgraph Vuln2
subgraph fingerprint22[fingerprint2]
type22["Type:Source"]
method22["Method:Hash"]
value22["Hash:XXXX"]
end
end
type12 -->|matches| type22
method12 -->|matches| method22
value12 -->|matches| value22
classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
classDef red fill:#f9f,stroke:#333,stroke-width:4px;
class fingerprint22 green
class fingerprint12 green
Two Matching Fingerprints
In this next example, Vuln1
and Vuln2
have several matching fingerprints. However, the first and highest priority (most accurate) match is with the Source:Scope+Offset
tracking fingerprint:
graph LR
subgraph Vuln1
subgraph fingerprint11[fingerprint1]
priority11["Priority:2"]
type11["Type:Source"]
method11["Method:Scope+Offset"]
value11["Hash:AAAA"]
end
subgraph fingerprint12[fingerprint2]
priority12["Priority:1"]
type12["Type:Source"]
method12["Method:Hash"]
value12["Hash:XXXX"]
end
end
subgraph Vuln2
subgraph fingerprint21[fingerprint1]
priority21["Priority:2"]
type21["Type:Source"]
method21["Method:Scope+Offset"]
value21["Hash:AAAA"]
end
subgraph fingerprint22[fingerprint2]
priority22["Priority:1"]
type22["Type:Source"]
method22["Method:Hash"]
value22["Hash:XXXX"]
end
end
priority11-->|matches| priority21
type11 -->|matches| type21
method11 -->|matches| method21
value11 -->|matches| value21
priority12-.->|matches| priority22
type12 -.->|matches| type22
method12 -.->|matches| method22
value12 -.->|matches| value22
classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
classDef red fill:#f9f,stroke:#333,stroke-width:4px;
class fingerprint11 green
class fingerprint21 green
Non-Matching High Priority Fingerprints
Prioritized tracking fingerprints are also used to disprove matches. In the example below, Vuln1
and Vuln2
both have matching low-priority Source:Hash
fingerprints, but the high-priority Source:Scope+Offset
tracking fingerprints do not match
graph LR
subgraph Vuln1
subgraph fingerprint11[fingerprint1]
priority11["Priority:2"]
type11["Type:Source"]
method11["Method:Scope+Offset"]
value11["Hash:AAAA"]
end
subgraph fingerprint12[fingerprint2]
priority12["Priority:1"]
type12["Type:Source"]
method12["Method:Hash"]
value12["Hash:XXXX"]
end
end
subgraph Vuln2
subgraph fingerprint21[fingerprint1]
priority21["Priority:2"]
type21["Type:Source"]
method21["Method:Scope+Offset"]
value21["Hash:BBBB"]
end
subgraph fingerprint22[fingerprint2]
priority22["Priority:1"]
type22["Type:Source"]
method22["Method:Hash"]
value22["Hash:XXXX"]
end
end
priority11-->|"✗"| priority21
type11 -->|"✗"| type21
method11 -->|"✗"| method21
value11 -->|"✗"| value21
priority12-.->|matches| priority22
type12 -.->|matches| type22
method12 -.->|matches| method22
value12 -.->|matches| value22
classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
classDef red fill:#f99,stroke-width:4px,font-weight:bold;
class fingerprint11 red
class fingerprint21 red
Notes
The changes in this MR are intended to be gated by a feature flag to make testing/rolling out this new feature easier.
Additionally, this MR will likely be split into three MRs:
- Database changes (no feature flag, need the table to be added)
- Documentation changes
- Backend changes (with feature-flag)
If you have questions/comments about a specific topic, please use discoto to keep the discussion organized:
- Create new distinct topics (threads) if one doesn't already exist
- Comment on existing topics if one already exists
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
🤖
Auto-Summary Discoto Usage
Points
Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive)
point:
. For example, the following are all valid points:
#### POINT: This is a point
* point: This is a point
+ Point: This is a point
- pOINT: This is a point
point: This is a **point**
Note that any markdown used in the point text will also be propagated into the topic summaries.
Outcomes
Outcomes define the decisions or resolutions of a discussion. Once outcomes are defined, sub-topics and points are collapsed underneath the outcomes.
Outcomes are declared in a similar manner as points:
#### OUTCOME: This is an outcome
* outcome: This is an outcome
+ Outcome: This is an outcome
- oUTCOME: This is an outcome
outcome: This is an outcome
Note that multiple outcomes may be declared for each topic.
Topics
Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.
Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive)
topic:
. For example, the following are all valid topics:
# Topic: Inline discussion topic 1
## TOPIC: **{+A Green, bolded topic+}**
### tOpIc: Another topic
Quick Actions
Action Description /discuss sub-topic TITLE
Create an issue for a sub-topic. Does not work in epics /discuss link ISSUABLE-LINK
Link an issuable as a child of this discussion Discussion-Size Indicators
The relative size of the discussion occurring within a topic and its sub-topics is indicated via braille dots.
More dots means that more points or sub-topics exist within a given topic.
Examples:
- TOPIC
⣿⣿⡆
A large discussion occurred here- TOPIC
⣇
A smaller discussion occurred here
Last updated by this job
TOPIC
⡀
TOPIC
⣿
⢀
Todo: `tracking` field fallback when not present !52722 (comment 495625550)
OUTCOME: Split into more MRs. This MR --> the store-only MR ~defined !52722 (comment 496530314) !52722 (comment 496555710)- possible split lines for this MR !52722 (comment 496530314)
TOPIC
⡀
TOPIC
⡄
OUTCOME: Yes, but in the next MR see !52722 (comment 496523975) !52722 (comment 496534498)