Draft: Improve Vulnerability Tracking: Backend implementation (!52722) · Merge requests · GitLab.org / GitLab

James Johnson requested to merge improve_vuln_tracking-backend_implementation into master Jan 27, 2021

This MR is the GitLab backend implementation of the proof-of-concept MR: !45339 (closed) and is part of the epic https://gitlab.com/groups/gitlab-org/-/epics/4690 - Improve Vulnerability Tracking: MVP: Scope+Offset

What does this MR do?

The changes in this merge request allow multiple fingerprinting algorithms to be used to compare vulnerabilities in a migration-less manner.

This lets us develop better fingerprinting algorithms while keeping backwards compatibility with existing vulnerability fingerprints.

High-Level Details

This is done by:

Adding a new vulnerability_tracking_fingerprints table that enables one-to-many relationships between findings and fingerprints
Using the tracking field in the security report
Depending on a post-analyzer to calculate and insert the actual fingerprint value in a pipeline (see this epic https://gitlab.com/groups/gitlab-org/-/epics/4690)

This merge request DOES NOT contain the implementation for the scope+offset calculation, it only uses the values inserted into the security report JSON during the pipeline (if they exist).

Technical Details

Security Report JSON Tracking Field (expand me)

The tracking field schema assumed in this MR has the following form:

"tracking": {
  "type": "source",
  "positions": [
    { "file": "path/to/file1.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file1.ext|scope1|scope2:offset" } },
    { "file": "path/to/file2.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file2.ext|scope1|scope2:offset" } },
    { "file": "path/to/file3.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file3.ext|scope1|scope2:offset" } }
  ]
}

The tracking field is originally (ideally) created by an analyzer for each vulnerability. The fingerprints subfield is added by a post-analyzer.

If the tracking field does not exist, the behavior is to fall back to hashing data from the location field in the same manner as before.

See #233167 (closed) and #293706 (closed) for details on the tracking field

Multiple Fingerprints Per Vulnerability (expand me)

A new table, vulnerability_tracking_fingerprints, is added that creates a one-to-many relationship between findings and vulnerability tracking fingerprints.

Each tracking fingerprint record associated with a finding contains:

Field	Notes
`finding_id`	A foreign key to the finding
`sha`	A sha of the tracking fingerprint
`track_type`	The type of tracking (`hash` or `source`) This is the declared tracking type from the analyzer
`track_method`	This is NOT set by the analyzer, but is the specific tracking algorithm used to fulfill `track_type`. `hash` and `scope+offset` are examples of this

Additionally, the code itself has priorities set for each tracking method. This is NOT set by the analyzer and is used to compare tracking fingerprints between vulnerabilities in descending priority. The priority is intended to be a relative accuracy rating. Higher priority tracking fingerprints should be more accurate than lower priority tracking fingerprints.

Comparing Vulnerabilities with Fingerprint Priorities (expand me)

Each tracking fingerprint method has a priority defined in the GitLab backend. Vulnerabilities with matching tracking fingerprints are compared in descending order.

One Matching Lower-Priority Fingerprint

For example, Vuln1's Source:Hash tracking fingerprint below is the only match with any of Vuln2's tracking fingerprints. Since this is the first (and only) match, they are considered the same vulnerability.

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint22[fingerprint2]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  type12 -->|matches| type22
  method12 -->|matches| method22
  value12 -->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f9f,stroke:#333,stroke-width:4px;

  class fingerprint22 green
  class fingerprint12 green

Two Matching Fingerprints

In this next example, Vuln1 and Vuln2 have several matching fingerprints. However, the first and highest priority (most accurate) match is with the Source:Scope+Offset tracking fingerprint:

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      priority11["Priority:2"]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      priority12["Priority:1"]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint21[fingerprint1]
      priority21["Priority:2"]
      type21["Type:Source"]
      method21["Method:Scope+Offset"]
      value21["Hash:AAAA"]
    end
    subgraph fingerprint22[fingerprint2]
      priority22["Priority:1"]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  priority11-->|matches| priority21
  type11 -->|matches| type21
  method11 -->|matches| method21
  value11 -->|matches| value21

  priority12-.->|matches| priority22
  type12 -.->|matches| type22
  method12 -.->|matches| method22
  value12 -.->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f9f,stroke:#333,stroke-width:4px;

  class fingerprint11 green
  class fingerprint21 green

Non-Matching High Priority Fingerprints

Prioritized tracking fingerprints are also used to disprove matches. In the example below, Vuln1 and Vuln2 both have matching low-priority Source:Hash fingerprints, but the high-priority Source:Scope+Offset tracking fingerprints do not match

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      priority11["Priority:2"]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      priority12["Priority:1"]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint21[fingerprint1]
      priority21["Priority:2"]
      type21["Type:Source"]
      method21["Method:Scope+Offset"]
      value21["Hash:BBBB"]
    end
    subgraph fingerprint22[fingerprint2]
      priority22["Priority:1"]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  priority11-->|"✗"| priority21
  type11 -->|"✗"| type21
  method11 -->|"✗"| method21
  value11 -->|"✗"| value21

  priority12-.->|matches| priority22
  type12 -.->|matches| type22
  method12 -.->|matches| method22
  value12 -.->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f99,stroke-width:4px,font-weight:bold;

  class fingerprint11 red
  class fingerprint21 red

Notes

The changes in this MR are intended to be gated by a feature flag to make testing/rolling out this new feature easier.

Additionally, this MR will likely be split into three MRs:

Database changes (no feature flag, need the table to be added)
Documentation changes
Backend changes (with feature-flag)

If you have questions/comments about a specific topic, please use discoto to keep the discussion organized:

Create new distinct topics (threads) if one doesn't already exist
Comment on existing topics if one already exists

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Auto-Summary 🤖

Discoto Usage

Points

Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive) point:. For example, the following are all valid points:

#### POINT: This is a point

* point: This is a point

+ Point: This is a point

- pOINT: This is a point

point: This is a **point**

Note that any markdown used in the point text will also be propagated into the topic summaries.

Outcomes

Outcomes define the decisions or resolutions of a discussion. Once outcomes are defined, sub-topics and points are collapsed underneath the outcomes.

Outcomes are declared in a similar manner as points:

#### OUTCOME: This is an outcome

* outcome: This is an outcome

+ Outcome: This is an outcome

- oUTCOME: This is an outcome

outcome: This is an outcome

Note that multiple outcomes may be declared for each topic.

Topics

Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.

Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive) topic:. For example, the following are all valid topics:

# Topic: Inline discussion topic 1

## TOPIC: **{+A Green, bolded topic+}**

### tOpIc: Another topic

Quick Actions

Action Description

/discuss sub-topic TITLE Create an issue for a sub-topic. Does not work in epics

/discuss link ISSUABLE-LINK Link an issuable as a child of this discussion

Discussion-Size Indicators

The relative size of the discussion occurring within a topic and its sub-topics is indicated via braille dots.

More dots means that more points or sub-topics exist within a given topic.

Examples:

TOPIC ⣿⣿⡆ A large discussion occurred here

TOPIC ⣇ A smaller discussion occurred here

Action	Description
`/discuss sub-topic TITLE`	Create an issue for a sub-topic. Does not work in epics
`/discuss link ISSUABLE-LINK`	Link an issuable as a child of this discussion

Last updated by this job

TOPIC ⡀ Todo: Unit tests !52722 (comment 495624327)
TOPIC ⣿ ⢀ Todo: `tracking` field fallback when not present !52722 (comment 495625550)
OUTCOME: Split into more MRs. This MR --> the store-only MR ~defined !52722 (comment 496530314) !52722 (comment 496555710)
- possible split lines for this MR !52722 (comment 496530314)
TOPIC ⡀ Todo: Feedback tracking !52722 (comment 495626765)
TOPIC ⡄ Feature Flag !52722 (comment 496523580)
OUTCOME: Yes, but in the next MR see !52722 (comment 496523975) !52722 (comment 496534498)

Edited Jan 28, 2021 by 🤖 GitLab Bot 🤖

Draft: Improve Vulnerability Tracking: Backend implementation