Adds Wal Receiver Saturation indicator
What does this MR do and why?
It adds a new health indicator for BBM, based on the WAL receiver saturation metric.
Query
# main
max(1 - quantile_over_time(0.50, postgres_replication_process_state_ratio{env="gprd", type="patroni", process_type="walreceiver", process_state="S"}[5m]))
=> [{"metric"=>{}, "value"=>[1714755522.823, "0.7833"]}]
# ci
max(1 - quantile_over_time(0.50, postgres_replication_process_state_ratio{env="gprd", type="patroni-ci", process_type="walreceiver", process_state="S"}[5m]))
=> [{"metric"=>{}, "value"=>[1714676243.81, "0.7541625"]}]
CR issues
- gstg: gitlab-com/gl-infra/production#17949 (closed)
- gprd: gitlab-com/gl-infra/production#17950 (closed)
How to set up and validate locally
Prerequisite: As Thanos cannot be accessed from local machine, we have to mock the promQL result in local.
Scenario 1: Signals::NotAvailable
without the required feature flag
Feature.enabled?(:db_health_check_wal_receiver_saturation, type: :ops)
=> false
context = OpenStruct.new(gitlab_schema: :gitlab_main)
indicator = Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation.new(context)
indicator.evaluate
#<Gitlab::Database::HealthStatus::Signals::NotAvailable:0x000000017907f150 @indicator_class=Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation, @reason="indicator disabled">
Scenario 2: Signals::Unknown
on empty prometheus alert settings
Feature.enable(:db_health_check_wal_receiver_saturation)
application_setting = ApplicationSetting.last
application_setting.update(prometheus_alert_db_indicators_settings: nil)
indicator.evaluate
#<Gitlab::Database::HealthStatus::Signals::Unknown:0x0000000179e90cd0 @indicator_class=Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation, @reason="Prometheus Settings not configured">
Scenario 3: Signals::Stop
on WAL receiver saturation condition not being met
ApplicationSetting.last.update(
prometheus_alert_db_indicators_settings: {
prometheus_api_url: '',
wal_receiver_saturation_sli_query: {
main_cell: 'max(1 - quantile_over_time(0.50, postgres_replication_process_state_ratio{env="gprd", type="patroni", process_type="walreceiver", process_state="S"}[5m]))',
main: 'max(1 - quantile_over_time(0.50, postgres_replication_process_state_ratio{env="gprd", type="patroni", process_type="walreceiver", process_state="S"}[5m]))',
ci: 'max(1 - quantile_over_time(0.50, postgres_replication_process_state_ratio{env="gprd", type="patroni-ci", process_type="walreceiver", process_state="S"}[5m]))'
},
wal_receiver_saturation_slo: {
main: 0.7,
ci: 0.7
}
}
)
# Manually change Gitlab::PrometheusClient.ready? to `return true`
# Manually change Indicators::PrometheusAlertIndicator.fetch_sli to return a value above 0.7, eg: 0.7814833333333333
reload!
indicator = Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation.new(context)
indicator.evaluate
#<Gitlab::Database::HealthStatus::Signals::Stop:0x0000000178fff108 @indicator_class=Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation, @reason="WalReceiverSaturation SLI condition not met">
Scenario 4: Signals::Normal
on WAL receiver condition being met
# Manually change Indicators::PrometheusAlertIndicator.fetch_sli to return a value below 70000000, eg: 0.68148
reload!
indicator = Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation.new(context)
indicator.evaluate
#<Gitlab::Database::HealthStatus::Signals::Normal:0x000000017997d7c0 @indicator_class=Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation, @reason="WalReceiverSaturation SLI condition met">
Scenario 5: Signals::Unknown
on WAL receiver condition cannot be calculated
# Manually change Indicators::PrometheusAlertIndicator.fetch_sli to return nil
reload!
indicator = Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation.new(context)
indicator.evaluate
#<Gitlab::Database::HealthStatus::Signals::Normal:0x000000017997d7c0 @indicator_class=Gitlab::Database::HealthStatus::Indicators::WalReceiverSaturation, @reason="WalReceiverSaturation SLI condition met">
Related to #421694 (closed)
Edited by Leonardo da Rosa