Include non-Ruby processes in Topology usage data
What does this MR do?
This is a follow-up to:
- Prometheus as a Usage Ping data source: !32315 (merged)
- Send topology in Usage Ping: !33191 (merged)
In that last MR we scoped service level data to just Ruby services since at the time it wasn't entirely clear yet how to get this information for non-Ruby services.
This MR does two things:
- It adds most, but not all, non-Ruby components customers can run to the
topology
usage ping. Similar data is exported (process_count
andprocess_memory_rss
) but it's not as complete because we have less data available for those services, and some services don't export any metrics at all (so with those we're flying blind.) - It maps
job
names that were previously just symbolized and underscored to a set of well-defined service names. This will ensure that we can maintain a stable schema in the face of changing job names at the source. Unmapped services will be ignored, so that we do not accidentally include non-GitLab services once we extend this feature to external Prometheus servers, which could be scraping who knows what.
New services that should now be captured should include:
- Gitaly
- Redis
- Postgres
- Prometheus
- node-exporter
Services that are not included because they do not currently export metrics to Prometheus, or because they are difficult to include:
- Consul
- PGBouncer (but support is on the way!)
- NFS servers
- Load balancers
- Nginx
- Grafana
- alertmanager
- logrotate
- redis-exporter
- postgres-exporter
- gitlab-exporter
- sshd
NOTE that as with the original MR, all of this will only apply to single-node installations for now, since we do not yet have the capabilities to locate an external Prometheus node. This will change at some point in the future though, so can never hurt to look at this through the "future looking glass"
Refs #218546 (closed)
Example
Pulled from the Usage Ping preview payload generated by registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:c0c45395c73eb5b595db389a7a0137cd0a043d24
:
"topology": {
"nodes": [
{
"node_memory_total_bytes": 33269903360,
"node_cpus": 16,
"node_services": [
{
"name": "web",
"process_count": 16,
"process_memory_pss": 195114368,
"process_memory_rss": 780203776,
"process_memory_uss": 155836416
},
{
"name": "node-exporter",
"process_count": 1,
"process_memory_rss": 18259968
},
{
"name": "postgres",
"process_count": 1,
"process_memory_rss": 18976768
},
{
"name": "workhorse",
"process_count": 1,
"process_memory_rss": 36425728
},
{
"name": "gitaly",
"process_count": 1,
"process_memory_rss": 37654528
},
{
"name": "redis",
"process_count": 1,
"process_memory_rss": 19324928
},
{
"name": "sidekiq",
"process_count": 1,
"process_memory_pss": 705674240,
"process_memory_uss": 702689280,
"process_memory_rss": 720261120
}
]
}
],
"duration_s": 0.021399167999334168
}
Does this MR meet the acceptance criteria?
Conformity
- [-] Changelog entry. Not necessary. We already created a changelog when first adding the
topology
ping and the data carried within is already covered by our privacy policy. - [-] Documentation (if required) We have a separate issue for that: #220143 (closed)
-
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
- [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
-
Test in Omnibus container