Sending operation metrics to GraphOS
If you have a cloud supergraph, your router automatically sends operation metrics to GraphOS! This article is for other types of graphs, such as enterprise supergraphs that run their own instance of the Apollo Router.
From the Apollo Router or Apollo Server
Both the Apollo Router and Apollo Server use the same mechanism to enable operation metrics reporting to GraphOS:
Obtain a graph API key from Studio.
Obtain the graph ref for the graph and variant you want to report metrics for.
Your variant's graph ref is shown at the top of its README page in Apollo Studio. It has the format
graph-id@variant-name
(such asmy-graph@staging
).Use the obtained values to set the following environment variables in your environment before starting up your router/server:
export APOLLO_KEY=<YOUR_GRAPH_API_KEY>export APOLLO_GRAPH_REF=<YOUR_GRAPH_REF>
Consult your production environment's documentation to learn how to set its environment variables.
Now when your router/server starts up, it automatically begins reporting operation metrics to GraphOS.
Adding support to a third-party server (advanced)
You can set up a reporting agent in your GraphQL server to push metrics to GraphOS. The agent is responsible for:
- Translating operation details into the correct reporting format
- Implementing a default signature function to identify each executed operation
- Emitting batches of traces and statistics to the GraphOS reporting endpoint
- Optionally defining plugins to enable advanced reporting features
Apollo Server defines its agent for performing these tasks in the usage reporting plugin.
If you're interested in collaborating with Apollo on creating a dedicated integration for your GraphQL server, please get in touch with us at support@apollographql.com.
Reporting format
The GraphOS reporting endpoint accepts batches of traces and statistics that are encoded in protocol buffer format. Each trace corresponds to the execution of a single GraphQL operation, including a breakdown of the timing and error information for each field that's resolved as part of the operation.
The schema for this protocol buffer is defined as the Report
message in the protobuf schema.
This document describes how to create a report whose tracesPerQuery
objects consist solely of a list of detailed execution traces in the trace
array. GraphOS now allows your server to describe usage as a mix of detailed execution traces and pre-aggregated statistics (released in Apollo Server 2.24), which leads to much more efficient reports. This document does not yet describe how to generate these statistics. It also does not describe how to support the "referencing operations" feature of the Fields page. We strongly encourage developers to contact Apollo support at support@apollographql.com to discuss their use case prior to building their own reporting agent using this module.
As a starting point, we recommend implementing an extension to the GraphQL execution that creates a report with a single trace, as defined in the Trace
message of the protobuf schema. Then, you can batch multiple traces into a single report. We recommend sending batches approximately every 20 seconds, and limiting each batch to a reasonable size (~4MB).
Many server runtimes already support emitting tracing information as a GraphQL extension. Such extensions are available for Node, Ruby, Scala, Java, Elixir, and .NET. If you're working on adding metrics reporting functionality for one of these languages, reading through that tracing instrumentation is a good place to start. For other languages, we recommend consulting the Apollo Server usage reporting plugin.
Operation signing
For Apollo Studio to correctly group GraphQL queries, your reporting agent should define a function to generate a query signature for each distinct query. This can be challenging, because two structurally different queries can be functionally equivalent. For instance, all of the following queries request the same information:
query AuthorForPost($foo: String!) {post(id: $foo) {author}}query AuthorForPost($bar: String!) {post(id: $bar) {author}}query AuthorForPost {post(id: "my-post-id") {author}}query AuthorForPost {post(id: "my-post-id") {writer: author}}
It's important to decide how to group such queries when tracking metrics. The TypeScript reference implementation does the following to every query before generating its signature to better group functionally equivalent operations:
- Drop unused fragments and/or operations
- Hide string literals
- Ignore aliases
- Sort the tree deterministically
- Ignore differences in whitespace.
We recommend using the same default signature method for consistency across different server runtimes.
Sending metrics to the reporting endpoint
After your GraphQL server prepares a batch of traces, it should send them to the Studio reporting endpoint at the following URL:
https://usage-reporting.api.apollographql.com/api/ingress/traces
Each batch should be sent as an HTTP POST request. The body of the request can be one of the following:
- A binary serialization of a
Report
message - A gzipped binary serialization of a
Report
message
To authenticate with Studio, each request must include either:
- An
X-Api-Key
header with a valid API key for your graph - An
authtoken
cookie with a valid API key for your graph
Only graph-level API keys (starting with the prefix service:
) are supported.
The request can also optionally include a Content-Type
header with value application/protobuf
, but this is not required.
⚠️ The reporting endpoint rejects reports that are older than 50 minutes. If you see an error like Rejecting report from service <your service> with skewed timestamp
, make sure your traces are current and that your timestamp calculations are accurate.
For a reference implementation, see the sendReport()
function in the TypeScript reference agent.
Tuning reporting behavior
We recommend implementing retries with backoff when you encounter 5xx
responses or networking errors when communicating with the reporting endpoint. Additionally, implement a shutdown hook to make sure you push all pending reports before your server initiates a healthy shutdown.
Implementing additional reporting features
The reference TypeScript implementation includes several features that you might want to include in your implementation. All of these features are implemented in the usage reporting plugin itself, and are documented the plugin's API reference.
For example, you can restrict which information is sent to GraphOS, particularly to avoid reporting personal data. Because personal data most commonly appears in variables and headers, the TypeScript agent offers options to sendVariablesValues
and sendHeaders
.