Enabling / disabling features does not propagate correctly to Geo secondaries
Summary
We use the flipper
gem to provide feature gates in GitLab. This seems to store a cache of enabled features in Redis.
In a Geo setup, the primary and secondary do not share Redis state. This means that changes to a feature on the primary do not invalidate the cache on the secondary, leading to inconsistent behaviour.
Steps to reproduce
On the secondary:
# Looks at the database
Feature.enabled?(:gitaly_ref_exists) # false
On the primary:
# Looks at the database
Feature.enabled?(:gitaly_ref_exists) # false
Feature.enable(:gitaly_ref_exists) # true
On the secondary:
# Does not look at the database
Feature.enabled?(:gitaly_ref_exists) # false
Only by clearing the redis cache manually are changes to feature flags reported by the secondary..
Possible fixes
- Disable the feature cache on the secondary
- Set a very short expiry on the secondary
- Send a cache invalidation event via the log cursor whenever features are changed on the primary
Note
This issue is referenced on the https://docs.gitlab.com/ee/administration/geo/disaster_recovery/background_verification.html. Please update the documentation when we fixed the issue.
/cc @jramsay @stanhu this one has surprised me twice in recent days. It could lead to some very unexpected outcomes on gprd.