Avoid collisions of ids while streaming audit events
Problem Statement
Currently when database saved audit events are streamed they are streamed using json which have id as nil. This happens because we use bulk insert to insert events into the database. bulk inserts do inserts the events in database but the events in memory are not updated. https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/audit/auditor.rb#L148
AuditEvent.bulk_insert!(events)
when AuditStreamingWorker
receives an event without id it replaces blank ids with created_at https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/audit_events/audit_event_streaming_worker.rb#L75
This is also fine as this can still be used to deduplicate events but the problem is created_at.to_i
gives us value of time as an integer number of seconds since the Epoch so this has higher number of collisions because of high scale traffic of audit events
Proposed Solution
- Make
bulk_insert
return ids for multiple events and fetch these events again from database using these ids. - For single events use simple ActiveRecord insert.
event.save!
- To avoid collisions of ids while streaming use
SecureRandom.uuid
if id of streaming json is blank.
draft MR !102972 (merged) to handle collisions for streaming only events.
How to reproduce ?
- Add a streaming destination https://docs.gitlab.com/ee/administration/audit_event_streaming.html#add-a-new-streaming-destination
- Try streaming a database saved audit event, for eg: zip download from project page.
- Compare streamed audit event payload id and id saved in database.
- Check audit_json.log has id: null.