Migration and model for audit event type streaming filter
What does this MR do and why?
This MR setups model which will help us in filtering audit event type streaming as per external destinations.
Queries to be used on this model
Fetch filters
SELECT *
FROM audit_events_streaming_event_type_filters
WHERE
audit_events_streaming_event_type_filters.external_audit_event_destination_id =
$id
AND audit_events_streaming_event_type_filters.audit_event_type IN (
$filter_array )
Delete filters
DELETE
FROM audit_events_streaming_event_type_filters
WHERE audit_events_streaming_event_type_filters.external_audit_event_destination_id = $id
AND audit_events_streaming_event_type_filters.audit_event_type IN ($filter_array)
Local Migration logs
rake db:migrate:up:main VERSION=20221003151747
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: migrating =======
main: -- create_table(:audit_events_streaming_event_type_filters, {})
main: -- quote_column_name(:audit_event_type)
main: -> 0.0000s
main: -> 0.0128s
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: migrated (0.0134s)
rake db:migrate:down:main VERSION=20221003151747
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: reverting =======
main: -- drop_table(:audit_events_streaming_event_type_filters, {})
main: -> 0.0028s
main: == 20221003151747 CreateAuditEventsStreamingEventTypeFilters: reverted (0.0042s)
Table anticipated growth
This table is going to be a read heavy, write will be used to just add filter which will rarely updated, so we can avoid writes anticipation.
These numbers are based upon the discussion below in thread !100024 (comment 1132713401)
Reads per hour
We will be querying table to check if filter exists when streaming for a particular destination.
destination.where(audit_event_type: event_type).exists?
AuditEventStreamingWorker
currently streams event for every external destination of a group.
Historical data for avg jobs
months | avg jobs per second |
---|---|
Feb 2022 | 1 |
April 2022 | 10 |
july 2022 | 30 |
oct 2022 | 40 |
In October 2022 usage_activity_by_stage.manage.audit_event_destinations
For Saas is 6
So we can say 6 destinations is producing on a average 40 jobs (source: https://app.periscopedata.com/app/gitlab/663045/Manage:-Compliance-Dashboard and https://dashboards.gitlab.net/d/stage-groups-compliance/stage-groups-compliance-group-dashboard?orgId=1)
jobs produced by 150 destinations = 40/6 * 150 = 1000 jobs per second 1 jobs = 1 query for read.
so reads per hour = 1000 * 60 * 60 = 3,600,000 this is for 150 destinations configured.
Taking upper bound of 150 for next year gives us following table
Month | Number of Destinations | Jobs per second | ≈ Reads per hour |
---|---|---|---|
Oct 2022 | 6 | 40 | 144000 |
Jan 2023 | 40 | 260 | 950000 |
April 2023 | 75 | 500 | 1800000 |
July 2023 | 110 | 730 | 2600000 |
Oct 2023 | 150 | 1000 | 3600000 |
Table Size
Table size = number of destinations * event types here we are assuming event_types to be 100 and upper bound for number of destinations to be 150 for next 1 year.
Month | Number of Destinations | ≈ Number of Rows |
---|---|---|
Oct 2022 | 6 | 600 |
Jan 2023 | 40 | 4000 |
April 2023 | 75 | 7500 |
July 2023 | 110 | 11000 |
Oct 2023 | 150 | 15000 |
Based on the anticipated data volume and access patterns, does the new table pose an availability risk to GitLab.com or self-managed instances? Does the proposed design scale to support the needs of GitLab.com and self-managed customers?
No this new table does not pose an availability risk to GitLab.com or self-managed instances.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #344845 (closed)