Migration and model for audit event type streaming filter (!100024) · Merge requests · GitLab.org / GitLab

Harsimar Sandhu requested to merge 344845-setup-model-for-filtering-streaming-audit-event into master Oct 03, 2022

What does this MR do and why?

This MR setups model which will help us in filtering audit event type streaming as per external destinations.

Queries to be used on this model

Fetch filters

SELECT *
FROM   audit_events_streaming_event_type_filters
WHERE
audit_events_streaming_event_type_filters.external_audit_event_destination_id =
       $id
AND audit_events_streaming_event_type_filters.audit_event_type IN (
    $filter_array )

Delete filters

DELETE
FROM   audit_events_streaming_event_type_filters
WHERE  audit_events_streaming_event_type_filters.external_audit_event_destination_id = $id
AND    audit_events_streaming_event_type_filters.audit_event_type IN ($filter_array)

Local Migration logs

rake db:migrate:up:main VERSION=20221003151747
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: migrating =======
main: -- create_table(:audit_events_streaming_event_type_filters, {})
main: -- quote_column_name(:audit_event_type)
main:    -> 0.0000s
main:    -> 0.0128s
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: migrated (0.0134s)


rake db:migrate:down:main VERSION=20221003151747
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: reverting =======
main: -- drop_table(:audit_events_streaming_event_type_filters, {})
main:    -> 0.0028s
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: reverted (0.0042s)

Table anticipated growth

This table is going to be a read heavy, write will be used to just add filter which will rarely updated, so we can avoid writes anticipation.

These numbers are based upon the discussion below in thread !100024 (comment 1132713401)

Reads per hour

We will be querying table to check if filter exists when streaming for a particular destination.

   destination.where(audit_event_type: event_type).exists?

AuditEventStreamingWorker currently streams event for every external destination of a group.

Historical data for avg jobs

months	avg jobs per second
Feb 2022	1
April 2022	10
july 2022	30
oct 2022	40

In October 2022 usage_activity_by_stage.manage.audit_event_destinations For Saas is 6 So we can say 6 destinations is producing on a average 40 jobs (source: https://app.periscopedata.com/app/gitlab/663045/Manage:-Compliance-Dashboard and https://dashboards.gitlab.net/d/stage-groups-compliance/stage-groups-compliance-group-dashboard?orgId=1)

jobs produced by 150 destinations = 40/6 * 150 = 1000 jobs per second 1 jobs = 1 query for read.

so reads per hour = 1000 * 60 * 60 = 3,600,000 this is for 150 destinations configured.

Taking upper bound of 150 for next year gives us following table

Month	Number of Destinations	Jobs per second	≈ Reads per hour
Oct 2022	6	40	144000
Jan 2023	40	260	950000
April 2023	75	500	1800000
July 2023	110	730	2600000
Oct 2023	150	1000	3600000

Table Size

Table size = number of destinations * event types here we are assuming event_types to be 100 and upper bound for number of destinations to be 150 for next 1 year.

Month	Number of Destinations	≈ Number of Rows
Oct 2022	6	600
Jan 2023	40	4000
April 2023	75	7500
July 2023	110	11000
Oct 2023	150	15000

Based on the anticipated data volume and access patterns, does the new table pose an availability risk to GitLab.com or self-managed instances? Does the proposed design scale to support the needs of GitLab.com and self-managed customers?

No this new table does not pose an availability risk to GitLab.com or self-managed instances.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #344845 (closed)

Edited Oct 20, 2022 by Harsimar Sandhu

Migration and model for audit event type streaming filter