Add error information to GraphQL logs and metrics
chore: refactor graphql instrumentation tracer
The GraphQL ruby tracers don't work as we expected them to. An
#execute_query
trace does not necessarily execute the query. It
prepares the query for execution.
The execution itself happens in the context of a multiplex, even if there is only one query. This ensures that data that can be shared between the queries, avoiding hitting the resources multiple times.
This changes the logging and instrumentation of those queries to share the duration of the total execution. This ensures that we have relevant duration information in the logs.
In practice, I don't think our frontends multiplex queries.
This is also the point at which the query is fully execution, which means we can inspect the result to gather any errors that happened during the execution of the query and expose that information in metrics and logs.
This merges the 3 tracers that are supposed to provide information into a single one that collects all of the information. This ensures that we're always comparing apples to apples when we talk about durations: the duration in the logs is also the duration we've used for the apdex metric.
For #345263 (closed)
feat: take GQL-query success into account for instrumentation
This includes the GraphQL error messages in the logs if there were any.
It also prevents recording an apdex for failed queries, as those are not very valuable to calculate performance.
For #345263 (closed)
feat: add an error rate SLI for graphql queries
This adds a counter for all GraphQL queries. It increments the ops
rate for all queries exectuted and it increments the error rate in
case there was an exception or if the result contained errors.
This means that invalid queries sent to us will also result in an
error. For this reason, we need to make sure that we only include
queries that we know of in our SLIs. We can distinguish these in
metrics using the endpoint_id
label. We'll only populate that label
for queries from our own application. All other queries will have
graphql:unknown
as the endpoint_id
.
For #345263 (closed)