Skip to content

Migration and model for audit event type streaming filter

What does this MR do and why?

This MR setups model which will help us in filtering audit event type streaming as per external destinations.

Queries to be used on this model

Fetch filters

SELECT *
FROM   audit_events_streaming_event_type_filters
WHERE
audit_events_streaming_event_type_filters.external_audit_event_destination_id =
       $id
AND audit_events_streaming_event_type_filters.audit_event_type IN (
    $filter_array ) 

Delete filters

DELETE
FROM   audit_events_streaming_event_type_filters
WHERE  audit_events_streaming_event_type_filters.external_audit_event_destination_id = $id
AND    audit_events_streaming_event_type_filters.audit_event_type IN ($filter_array)

Local Migration logs

rake db:migrate:up:main VERSION=20221003151747
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: migrating =======
main: -- create_table(:audit_events_streaming_event_type_filters, {})
main: -- quote_column_name(:audit_event_type)
main:    -> 0.0000s
main:    -> 0.0128s
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: migrated (0.0134s)


rake db:migrate:down:main VERSION=20221003151747
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: reverting =======
main: -- drop_table(:audit_events_streaming_event_type_filters, {})
main:    -> 0.0028s
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: reverted (0.0042s)

Table anticipated growth

This table is going to be a read heavy, write will be used to just add filter which will rarely updated, so we can avoid writes anticipation.

These numbers are based upon the discussion below in thread !100024 (comment 1132713401)

Reads per hour

We will be querying table to check if filter exists when streaming for a particular destination.

   destination.where(audit_event_type: event_type).exists?

AuditEventStreamingWorker currently streams event for every external destination of a group.

Historical data for avg jobs

months avg jobs per second
Feb 2022 1
April 2022 10
july 2022 30
oct 2022 40

In October 2022 usage_activity_by_stage.manage.audit_event_destinations For Saas is 6 So we can say 6 destinations is producing on a average 40 jobs (source: https://app.periscopedata.com/app/gitlab/663045/Manage:-Compliance-Dashboard and https://dashboards.gitlab.net/d/stage-groups-compliance/stage-groups-compliance-group-dashboard?orgId=1)

jobs produced by 150 destinations = 40/6 * 150 = 1000 jobs per second 1 jobs = 1 query for read.

so reads per hour = 1000 * 60 * 60 = 3,600,000 this is for 150 destinations configured.

Taking upper bound of 150 for next year gives us following table

Month Number of Destinations Jobs per second ≈ Reads per hour
Oct 2022 6 40 144000
Jan 2023 40 260 950000
April 2023 75 500 1800000
July 2023 110 730 2600000
Oct 2023 150 1000 3600000

Table Size

Table size = number of destinations * event types here we are assuming event_types to be 100 and upper bound for number of destinations to be 150 for next 1 year.

Month Number of Destinations ≈ Number of Rows
Oct 2022 6 600
Jan 2023 40 4000
April 2023 75 7500
July 2023 110 11000
Oct 2023 150 15000

Based on the anticipated data volume and access patterns, does the new table pose an availability risk to GitLab.com or self-managed instances? Does the proposed design scale to support the needs of GitLab.com and self-managed customers?

No this new table does not pose an availability risk to GitLab.com or self-managed instances.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #344845 (closed)

Edited by Harsimar Sandhu

Merge request reports

Loading