GitLab Pages depends on the availability of GitLab API
If gitlab API is not available for some reason(e.g. gitlab-com/gl-infra/production#1936 (closed)) GitLab Pages currently will become unavailable too(we clear cache in case of any API lookup problem).
If we not clear cache in these cases, GitLab Pages will be more independent and won't produce spikes of errors like https://dashboards.gitlab.net/d/web-pages-main/web-pages-overview?orgId=1&from=1586889900000&to=1586891700000&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-sigma=2
The following discussion from !194 (merged) should be addressed:
-
@ayufan started a discussion: (+1 comment) Should we consider replacing entry only if the new received one does not have
entry.lookup.Error
to solve intermediate errors of processing lookups?I consider the today it can happen that we store the lookup of success request, but after refresh we receive error, like 500 from GitLab. It would help if we could catch with this mechanism a short living errors of GitLab API, and try to re-use success requests for as long as long interval. This reduces Pages dependence on API being super stable, and we anticipate that upstream API is flaky to some extent.
Or maybe better is in such case, is to re-use current lookup and extend the lease on the lookup to allow another refresh?
if entry.response != nil && entry.response.Error != nil { entry.response = e.response }
Note: I'm fine following that in next MR as an stability improvement.