Query usage data from bundled Prometheus
What does this MR do?
See #217666 (closed)
In order to collect memory and topology statistics from on-prem customers, we need the ability to query their Prometheus server for the metrics we produce (e.g. from node_exporter
) and transmit them back via usage pings. This will allow us to map out our customer base better in terms of which reference architecture they fall under, how much memory their setups consume etc.
For an MVC, this first MR adds the following:
-
A single metric as a proof-of-concept that is queried via Prometheus and submitted as a Usage Ping.
I decided to report only a single metric for now, node_memory_total_bytes
as a proof-of-concept. This metric requires a node_exporter
to run.
Questions:
-
Should this be behind a feature toggle? - Decided it's probably not useful, since the change will only impact single-node self-managed deployments, over which we have no control anyway
Notes
Data structure
The data structure has been added to the top-level and looks as follows:
{
topology: {
nodes: [
{
node_memory_total_bytes: 1024
}
]
}
}
The final structure is still TBD.
Error handling
We expect the topology
structure to get a lot more complex, so this is still TBD, but I currently simply fall back to an empty Hash {}
whenever we failed to connect to Prometheus, or Prometheus wasn't enabled by the customer, or any error is thrown.
If no error was raised, but we simply didn't find any results, we will fall back to default values. For now that is only the empty array []
if we fail to collect any node data.
Reach
The change will only work for (and affect) single-node Omnibus deployments. That's because we cannot currently locate a Prometheus instance that is not running on the same node as the application that submits the Usage Ping
. This will change in the future. Furthermore, those customers will have to have Prometheus enabled of course, and a node_exporter
running for any data to come through to us.
Screenshots
From Admin Area > Metrics and profiling > Usage statistics
No prometheus configured / Prometheus down |
Prometheus available |
---|---|
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required) see #220143 (closed)
-
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
- [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
-
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
I've tested this against a local prometheus and it worked.
I tried to also test this on a review-app
, but Usage Ping
is disabled and greyed out: https://gitlab-review-217666-pro-tg17m7.gitlab-review.app/admin/application_settings/metrics_and_profiling
It sounds like this would have to be enabled via gitlab.rb
? I also tried instrumenting it from a Rails console, but the task-runner
constantly fails with what looks like an OOM when trying to launch a gitlab-rails console
:
$ kubectl exec review-217666-pro-tg17m7-task-runner-675949d75f-qxd55 --namespace review-apps-ee -c task-runner -it -- gitlab-rails console
Fetching cluster endpoint and auth data.
kubeconfig entry generated for review-apps-ee.
--------------------------------------------------------------------------------
GitLab: 13.1.0-pre () EE
GitLab Shell: 13.2.0
PostgreSQL: 10.9
--------------------------------------------------------------------------------
/usr/local/bin/gitlab-rails: line 5: 14 Killed $rails_dir/bin/bundle exec rails "$@"