Skip to content

chore: tweak liveness probe

Shinya Maeda requested to merge tweak-liveness-probe into main

What does this merge request do and why?

This MR tweaks the liveness probe of Cloud Run similar to chore: enable startup probe for production (!904 - merged).

We are observing https://gitlab.com/groups/gitlab-org/-/epics/15402+ that occasionally liveness probes fail and trigger multiple-instance reboots, which affects the error rate.

For example, this brief incident occurred at 2024-10-03 07:00.

2024-10-03_18-56

https://log.gprd.gitlab.net/app/r/s/dZypi

We also see the healthcheck request failures in https://dashboards.gitlab.net/d/runway-service/runway3a-runway-service-metrics?orgId=1.

FYI, the current settings is:

        livenessProbe:
          timeoutSeconds: 1
          periodSeconds: 10
          failureThreshold: 3
          httpGet:
            path: /monitoring/healthz
            port: 8080

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Shinya Maeda

Merge request reports

Loading