Support sending traffic to redis-cache read replica
What does this MR do?
This is a proof of concept for improving the scalability of redis-cache by sending traffic to read replicas.
There are some redis keys that may not be safe for stale reads. Thus, the idea here is to perform stale reads selectively, we can incrementally migrate calls that are deemed safe to replicas. That can be done by passing stale_ok: true
to Rails.cache
read calls.
We perform a relatively crude load balancing across replicas: the sentinel support in redis-rb will select a random healthy replica at connection time.
The motivation behind this proposal is documented in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9414. This patch implements some ideas suggested there by @stanhu.
Screenshots
n/a
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
In order to test this change, I added rudimentary redis sentinel support to gdk: gdk@igor-sentinel.
This is quite close to what we are doing in production on gitlab.com.
Currently only a single key pattern is marked as safe for stale reads: cache:gitlab:Appearance:*
.
By looking at a jaeger trace, we can see that almost all redis calls are going to the master (port 6381):
But the reads for Appearance are going to a replica (port 6383):
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team