Clean up serialized objects in audit_events
What does this MR do?
Clean up serialized objects on details
field in audit_events
table.
Relates to #216577 (closed)
Database benchmarks
Number of rows effected
6,638,795 (via #database-lab
on 2020/06/03)
Index Only Scan using index_audit_events_on_ruby_object_in_details_2 on public.audit_events (cost=0.43..175497.02 rows=8788533 width=4) (actual time=1.215..11927.867 rows=9068244 loops=1)
Heap Fetches: 0
Buffers: shared hit=15746 read=24805
I/O Timings: read=10497.578
Query plans
Select without index
EXPLAIN SELECT id FROM audit_events WHERE details ~~ '%ruby/object%'
Time: 25.316 min
- planning: 5.909 ms
- execution: 25.316 min
- I/O read: 5.354 min
- I/O write: 7.269 s
Shared buffers:
- hits: 1019 (~8.00 MiB) from the buffer pool
- reads: 9975795 (~76.10 GiB) from the OS file cache, including disk I/O
- dirtied: 111574 (~871.70 MiB)
- writes: 111542 (~871.40 MiB)
Create index
CREATE INDEX CONCURRENTLY index_audit_events_on_ruby_object_in_details
ON audit_events
USING btree (id)
WHERE (details ~~ '%ruby/object%')
The query has been executed. Duration: 52.729 min
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+----------------------------------------------+-------+--------+--------------+--------+-------------
public | index_audit_events_on_ruby_object_in_details | index | gitlab | audit_events | 194 MB |
(1 row)
Schedule with index
Retrieve IDs in batch of 1000
EXPLAIN SELECT "audit_events"."id"
FROM "audit_events"
WHERE "audit_events"."id" >= 1
AND (details ~~ '%ruby/object%')
ORDER BY "audit_events"."id" ASC
LIMIT 1 OFFSET 1000
Time: 9.992 ms
- planning: 4.078 ms
- execution: 5.914 ms
- I/O read: 5.484 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 3 (~24.00 KiB) from the buffer pool
- reads: 5 (~40.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Get Min/Max IDs
EXPLAIN SELECT MIN(id), MAX(id)
FROM "audit_events"
WHERE "audit_events"."id" >= 274591324 AND "audit_events"."id" < 274592324
AND (details ~~ '%ruby/object%')
Time: 15.392 ms
- planning: 15.025 ms
- execution: 0.367 ms
- I/O read: 0.221 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 3 (~24.00 KiB) from the buffer pool
- reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Update (in batches of 1,000)
EXPLAIN UPDATE audit_events
SET details = regexp_replace(details, '!ruby/object.*name: ', '')
WHERE id BETWEEN 274591324 AND 274592324
AND (details ~~ '%ruby/object%')
Time: 17.545 ms
- planning: 5.525 ms
- execution: 12.020 ms
- I/O read: 11.882 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 0 from the buffer pool
- reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Migrations runtime
- Index: 60 mins
- Schedule: 2 secs
- Update: <10 days with 1K batches
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
- [-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Tan Le