Observability
Why Observability?
Observability provides insight into the status and errors encountered during a reindex. This empowers users to identify and remedy the root cause of issues that may occur within your composable commerce stack without needing to escalate to customer service.
This guide will go over common use cases, potential errors, and additional details on how to utilize this system to improve confidence during a reindex.
At A Glance
- Primary interface is a GraphQL API. You call this API to explore Telemetry data.
- You can make requests to get high level information about a reindex.
- Ability to explore error events for greater insight into failures.
- Ability to watch reindexes progress through summary queries.
- Ability to check if a reindex has started through digest queries.
Definitions
Concept | Description |
---|---|
Summaries | Summaries are a high level object of the status of an asynchronous process. This contains counts of different statuses that have occurred during the asynchronous process. Summaries should be used to determine if further investigation is needed. |
Group Id | This is an id that is used to correlate all asynchronous processes together. Group Id’s are unique per asynchronous process. |
Event | An event is a discrete action that occurred. Events have 4 different statuses Error, Complete, Warning, and Progress. Error events are used to alert users to issues that have been encountered during the asynchronous process. By default error events are always persisted. Complete events are used to signify that some action has reached a final step in the asynchronous process. By default complete events are not persisted. Warning events are used to alert uses to potential issues that could soon arise (ie a piece of content getting close to 1mb in size). By default warning events are not persisted. Progress events are used to signify work being processed in the asynchronous processes. By default progress events are not persisted |
Digests | /Digests are a high level object for overall status of related summaries. This contains a status of whether the process has started or is running, meta data relating to the entire process, and a category such as index or build. Digests should be used to give a high level overview of an entire category of work performed. |
Asynchronous Process | A process that runs across the nacelle system like a reindex. |
Schema
For additional details about the GraphQL Schema view: Schema
Authentication
The Telemetry API has 2 required headers to authenticate and authorize your request.
Please ensure the below headers are set before making your first request.
x-nacelle-space-id: <your-space-id>
x-nacelle-space-token: <your-space-token>
Sending Queries
The Nacelle Telemetry feature is currently only accessible via our GraphQL API.
Queries are sent via a POST request to https://event-visibility.api.nacelle.com/graphql
Insomnia Collection
If you are an Insomnia user, you can ‘Import’ the Telemetry collection linked below to get up and running even quicker: Insomnia Collection Export
Queries
Events
Events can be thought of as individual actions within the Nacelle platform. As an asynchronous request is processed in Nacelle’s distributed system, the groupId will serve as a tracer to link multiple events together.
events(filter: EventFilter): EventConnection!
events(filter: EventFilter): EventConnection!
The events query will return an EventConnection containing the requested data along with pagination information to enable pagination through the events.
This query can be very helpful to get more contextual information about a specific failure during a reindex.
events(filter: { "groupId": "exampleId" }) {
edges {
node {
groupId
timestamp
status
action
message
meta
nacelleEntryId
requestOrigin
}
}
}
event(id: ID!): Event
event(id: ID!): Event
The event query will return a single event given its id.
event("id": "exampleId" }) {
groupId
timestamp
status
action
message
meta
nacelleEntryId
requestOrigin
}
This query is mainly used to quickly dig into the details of a specific event of interest that was retrieved by the events
query
groups(filter: EventFilter): [Group!]!
groups(filter: EventFilter): [Group!]!
The groups query will return a list of Groups containing the number of events in multiple groupId’s.
{
groups (filter:{}){
name
count
}
}
This query is used to quickly determine all the groupIds in a space along with the total number of events associated with those group ids .
statuses(groupId: String!, filter: EventFilter): [Group!]!
statuses(groupId: String!, filter: EventFilter): [Group!]!
The statuses query will return a list of Groups containing the count of status in the groupId
{
statuses(groupId: "exampleId" filter:{}){
name
count
}
}
This query is used to quickly determine the counts of all statuses for a specific group id
Summaries
Summaries contain aggregated information about a reindex within the Nacelle platform – we use the groupId to group events and create summaries – and can be queried for high level information.
For example: If you try to watch a reindex live, the summary Node for that reindex will be updated overtime with “counts” of different event types as the system processes that reindex request. If the reindex encounters an “error event” the summaries’ countError field will be incremented by one for each error encountered.
summary(groupId: String!, sources: [String!]): Summary!
summary(groupId: String!, sources: [String!]): Summary!
If you already have a groupId and you only want to view just that summary for a reindex, you can query for a single summary using a groupId.
The below example query will return a single summary that matches the given group id.
summary(groupId: “exampleId”) {
groupID
countError
countWarn
countProgress
countComplete
firstEventTimestamp
}
summaries(filter:SummaryFilter): SummaryConnection
summaries(filter:SummaryFilter): SummaryConnection
The summaries query returns a SummaryConnection containing the requested data along with pagination information to enable pagination through the summaries.
The below example query will return a list of all summaries that match a given filter. In this case, the results will be ordered by oldest to newest. In other words, this will return summaries for all reindexes, from oldest to newest.
summaries(filter: { orderBy: "first_event_timestamp" }) {
edges {
node {
groupID
firstEventTimestamp
countComplete
countError
}
}
}
Digests
digest(groupId: String!, categories: [String!]): Digest!
digest(groupId: String!, categories: [String!]): Digest!
If you already have a groupId and you only want to view just that digest for a reindex, you can query for a single digest using a groupId.
The below example query will return a single digest that matches the given group id.
{
digest(groupId:"exampleid"){
status
Category
}
}
digests(filter: DigestFilter): DigestConnection!
digests(filter: DigestFilter): DigestConnection!
The digests query returns a DigestConnection containing the requested data along with pagination information to enable pagination through the digests.
The below example query will return a list of all digests that match a given filter. In this case, the results will be ordered by oldest to newest. In other words, this will return digests for all reindexes, from oldest to newest.
{
digests(filter:{}){
edges{
node{
Category
groupID
status
}
}
}
}
Obtain a GroupId
Before going further, we should quickly discuss the groupId
and its importance. The Nacelle Telemetry system groups events using a unique identifier. This allows the Telemetry system to remain flexible in the type of events it captures and how they are grouped. Today, there is a 1:1 relationship between a groupId
and a reindex.
groupId
's can be obtained in 1 of 2 ways.
GroupId from header response
When you start a reindex in the dashboard you will get a response header called x-nacelle-group-id. This can be used to query a summary directly.
GroupId from summaries query
Additionally you can query all summaries for a space-id to retrieve the newest group-id.
{
summaries(filter:{orderBy:"first_event_timestamp" direction:DESC}){
edges{
node{
groupID
firstEventTimestamp
}
}
}
}
The first node returned would be the most recent group id which can then be used to directly query for information about that asynchronous process.
Making Your First Request
To get started we will navigate to our api https://event-visibility.api.nacelle.com
Set Headers
Next fill out the required headers:
Obtain GroupId
Before we can begin querying we will need to obtain our group id. There are two methods for obtaining a group id described above in the documentation. Use whichever method works for your use case.
Write Query
Once we have a group id we can write a simple query to view summary data for that group id.
This query will return the counts for any encountered errors, warns, progress, or complete events that have occurred in Nacelle’s systems. This query will return updated data as additional work is done and can be used to check if a reindex is still processing.
View Results
This is example data, the data returned from your query will be different.
This shows our counts as well as the first event’s timestamp as well as the last events timestamp.
Additionally we see there is an error count of 1 which means our reindex has encountered an issue.
Determine Next Steps
In the example data above we found that there was an error that occurred during our reindex, however there is not much information to go off of yet. We can further utilize telemetry to figure out what went wrong. Our next step should be to query for event data so we can dig into the error we encountered.
Query Event Data
Event data is a discrete chunks of work that has occurred within Nacelles systems. When an error happens during one of these chunks Telemetry records the issue to provide insight. We can query for this data by running the following
View Event Data
Now that we have run our events query we should see all errors encountered during the reindex. In this example we will only have 1 issue. The data we get can be seen below
This is example data, the data returned from your query will be different.
This tells us a few useful pieces of information
- The nacelleEntryID.
- The timestamp of when the issue occurred with in the system
- The error message detailing the problem
- The action that was occurring in Nacelles systems when we encountered the error
Identifying Issues
These pieces of information can be used to determine what has gone wrong. In this example during a reindex we encountered a 404 error when querying Shopify for details of a product. A 404 error simply means that the resource was not found on the server.
Resolving Issues
Users can verify that Shopify correctly has the product in the catalog and perform a single reindex for that product in the dashboard. That will also have its own unique groupid which allows you to go through these steps again to view its work and verify that no issue occurred again. If the issue persists this information can be shared with Customer Support to help expedite investigation.
Troubleshooting
A request to the GraphQL API may respond with an error. Each error will provide a code and a message containing more details about the error.
Message | Reason | How to Fix |
---|---|---|
illegal base64 data at input byte 44 | The supplied cursor was not a base64 string. | Verify cursor is a base64 string. |
parsing time "..." as "...": cannot parse | The supplied timestamp is in the wrong format. | Times should be in the format: yyyy-MM-dd'T'hh:mm:ss'Z .Example: 2016-06-23T09:07:21.205Z |
invalid date range | The supplied timestamps are in the wrong order | This is caused when the filter has a timestamp for both start and end and the end timestamp is before the start timestamp. Examplefilter: { start:"2016-06-23T09:07:21.205Z" end:"2016-05-23T09:07:21.205Z" } |
unexpected end of JSON input | The supplied cursor was empty | Pass in cursor properly instead of using an empty string. |
failed to find any nodes with the given filter and spaceID | The query could not find any results | Adjust query, and verify filters. |
cursor does not match filter | The supplied cursor differs from the passed in filter. | Adjust filter to match cursor properly. |
please pass a valid x-nacelle-space-token header for the request. | Missing required header x-nacelle-space-token | Add the header with the correct space-id token |
please pass a valid x-nacelle-space-id header for the request | Missing required header x-nacelle-space-id | Add the header with the correct space-id |
Updated over 1 year ago