Observability

Why Observability?

Observability provides insight into the status and errors encountered during a reindex. This empowers users to identify and remedy the root cause of issues that may occur within your composable commerce stack without needing to escalate to customer service.

This guide will go over common use cases, potential errors, and additional details on how to utilize this system to improve confidence during a reindex.

At A Glance

  • Primary interface is a GraphQL API. You call this API to explore Telemetry data.
  • You can make requests to get high level information about a reindex.
  • Ability to explore error events for greater insight into failures.
  • Ability to watch reindexes progress through summary queries.
  • Ability to check if a reindex has started through digest queries.

Definitions

ConceptDescription
SummariesSummaries are a high level object of the status of an asynchronous process. This contains counts of different statuses that have occurred during the asynchronous process. Summaries should be used to determine if further investigation is needed.
Group IdThis is an id that is used to correlate all asynchronous processes together. Group Id’s are unique per asynchronous process.
EventAn event is a discrete action that occurred.

Events have 4 different statuses Error, Complete, Warning, and Progress.

Error events are used to alert users to issues that have been encountered during the asynchronous process. By default error events are always persisted.

Complete events are used to signify that some action has reached a final step in the asynchronous process. By default complete events are not persisted.

Warning events are used to alert uses to potential issues that could soon arise (ie a piece of content getting close to 1mb in size). By default warning events are not persisted.

Progress events are used to signify work being processed in the asynchronous processes. By default progress events are not persisted
Digests/Digests are a high level object for overall status of related summaries. This contains a status of whether the process has started or is running, meta data relating to the entire process, and a category such as index or build. Digests should be used to give a high level overview of an entire category of work performed.
Asynchronous ProcessA process that runs across the nacelle system like a reindex.

Schema

For additional details about the GraphQL Schema view: Schema

Authentication

The Telemetry API has 2 required headers to authenticate and authorize your request.

Please ensure the below headers are set before making your first request.

  • x-nacelle-space-id: <your-space-id>
  • x-nacelle-space-token: <your-space-token>

Sending Queries

The Nacelle Telemetry feature is currently only accessible via our GraphQL API.

Queries are sent via a POST request to https://event-visibility.api.nacelle.com/graphql

Insomnia Collection

If you are an Insomnia user, you can ‘Import’ the Telemetry collection linked below to get up and running even quicker: Insomnia Collection Export

Queries

Events

Events can be thought of as individual actions within the Nacelle platform. As an asynchronous request is processed in Nacelle’s distributed system, the groupId will serve as a tracer to link multiple events together.

events(filter: EventFilter): EventConnection!

The events query will return an EventConnection containing the requested data along with pagination information to enable pagination through the events.

This query can be very helpful to get more contextual information about a specific failure during a reindex.

events(filter: { "groupId": "exampleId" }) {
	edges {
		node {
			groupId
			timestamp
			status
			action
			message
			meta
			nacelleEntryId
			requestOrigin
		}
	}
}

event(id: ID!): Event

The event query will return a single event given its id.

event("id": "exampleId" }) {
	groupId
	timestamp
	status
	action
	message
	meta
	nacelleEntryId
	requestOrigin
}

This query is mainly used to quickly dig into the details of a specific event of interest that was retrieved by the events query

groups(filter: EventFilter): [Group!]!

The groups query will return a list of Groups containing the number of events in multiple groupId’s.

{
  groups (filter:{}){
    name
    count
  }
}

This query is used to quickly determine all the groupIds in a space along with the total number of events associated with those group ids .

statuses(groupId: String!, filter: EventFilter): [Group!]!

The statuses query will return a list of Groups containing the count of status in the groupId

{
  statuses(groupId: "exampleId" filter:{}){
    name
    count
  }
}

This query is used to quickly determine the counts of all statuses for a specific group id

Summaries

Summaries contain aggregated information about a reindex within the Nacelle platform – we use the groupId to group events and create summaries – and can be queried for high level information.

For example: If you try to watch a reindex live, the summary Node for that reindex will be updated overtime with “counts” of different event types as the system processes that reindex request. If the reindex encounters an “error event” the summaries’ countError field will be incremented by one for each error encountered.

summary(groupId: String!, sources: [String!]): Summary!

If you already have a groupId and you only want to view just that summary for a reindex, you can query for a single summary using a groupId.

The below example query will return a single summary that matches the given group id.

summary(groupId: “exampleId”) {
	groupID
	countError
	countWarn
	countProgress
	countComplete
	firstEventTimestamp
}

summaries(filter:SummaryFilter): SummaryConnection

The summaries query returns a SummaryConnection containing the requested data along with pagination information to enable pagination through the summaries.

The below example query will return a list of all summaries that match a given filter. In this case, the results will be ordered by oldest to newest. In other words, this will return summaries for all reindexes, from oldest to newest.

summaries(filter: { orderBy: "first_event_timestamp" }) {
	edges {
		node {
			groupID
			firstEventTimestamp
			countComplete
			countError
		}
	}
}

Digests

digest(groupId: String!, categories: [String!]): Digest!

If you already have a groupId and you only want to view just that digest for a reindex, you can query for a single digest using a groupId.

The below example query will return a single digest that matches the given group id.

{
  digest(groupId:"exampleid"){
    status
    Category
  }
}

digests(filter: DigestFilter): DigestConnection!

The digests query returns a DigestConnection containing the requested data along with pagination information to enable pagination through the digests.

The below example query will return a list of all digests that match a given filter. In this case, the results will be ordered by oldest to newest. In other words, this will return digests for all reindexes, from oldest to newest.

{
  digests(filter:{}){
    edges{
      node{
        Category
        groupID
        status
      }
    }
  }
}

Obtain a GroupId

Before going further, we should quickly discuss the groupId and its importance. The Nacelle Telemetry system groups events using a unique identifier. This allows the Telemetry system to remain flexible in the type of events it captures and how they are grouped. Today, there is a 1:1 relationship between a groupId and a reindex.

groupId's can be obtained in 1 of 2 ways.

GroupId from header response

When you start a reindex in the dashboard you will get a response header called x-nacelle-group-id. This can be used to query a summary directly.

GroupId from summaries query

Additionally you can query all summaries for a space-id to retrieve the newest group-id.

{
  summaries(filter:{orderBy:"first_event_timestamp" direction:DESC}){
    edges{
      node{
        groupID
        firstEventTimestamp
      }
    }
  }
}

The first node returned would be the most recent group id which can then be used to directly query for information about that asynchronous process.

Making Your First Request

To get started we will navigate to our api https://event-visibility.api.nacelle.com

Set Headers

Next fill out the required headers:

Obtain GroupId

Before we can begin querying we will need to obtain our group id. There are two methods for obtaining a group id described above in the documentation. Use whichever method works for your use case.

Write Query

Once we have a group id we can write a simple query to view summary data for that group id.

This query will return the counts for any encountered errors, warns, progress, or complete events that have occurred in Nacelle’s systems. This query will return updated data as additional work is done and can be used to check if a reindex is still processing.

View Results

📘

This is example data, the data returned from your query will be different.

This shows our counts as well as the first event’s timestamp as well as the last events timestamp.

Additionally we see there is an error count of 1 which means our reindex has encountered an issue.

Determine Next Steps

In the example data above we found that there was an error that occurred during our reindex, however there is not much information to go off of yet. We can further utilize telemetry to figure out what went wrong. Our next step should be to query for event data so we can dig into the error we encountered.

Query Event Data

Event data is a discrete chunks of work that has occurred within Nacelles systems. When an error happens during one of these chunks Telemetry records the issue to provide insight. We can query for this data by running the following

View Event Data

Now that we have run our events query we should see all errors encountered during the reindex. In this example we will only have 1 issue. The data we get can be seen below

📘

This is example data, the data returned from your query will be different.

This tells us a few useful pieces of information

  • The nacelleEntryID.
  • The timestamp of when the issue occurred with in the system
  • The error message detailing the problem
  • The action that was occurring in Nacelles systems when we encountered the error

Identifying Issues

These pieces of information can be used to determine what has gone wrong. In this example during a reindex we encountered a 404 error when querying Shopify for details of a product. A 404 error simply means that the resource was not found on the server.

Resolving Issues

Users can verify that Shopify correctly has the product in the catalog and perform a single reindex for that product in the dashboard. That will also have its own unique groupid which allows you to go through these steps again to view its work and verify that no issue occurred again. If the issue persists this information can be shared with Customer Support to help expedite investigation.

Troubleshooting

A request to the GraphQL API may respond with an error. Each error will provide a code and a message containing more details about the error.

MessageReasonHow to Fix
illegal base64 data at input byte 44The supplied cursor was not a base64 string.Verify cursor is a base64 string.
parsing time "..." as "...": cannot parseThe supplied timestamp is in the wrong format.Times should be in the format: yyyy-MM-dd'T'hh:mm:ss'Z.
Example:2016-06-23T09:07:21.205Z
invalid date rangeThe supplied timestamps are in the wrong orderThis is caused when the filter has a timestamp for both start and end and the end timestamp is before the start timestamp. Example
filter: {
start:"2016-06-23T09:07:21.205Z"
end:"2016-05-23T09:07:21.205Z"
}
unexpected end of JSON inputThe supplied cursor was emptyPass in cursor properly instead of using an empty string.
failed to find any nodes with the given filter and spaceIDThe query could not find any resultsAdjust query, and verify filters.
cursor does not match filterThe supplied cursor differs from the passed in filter.Adjust filter to match cursor properly.
please pass a valid x-nacelle-space-token header for the request.Missing required header x-nacelle-space-tokenAdd the header with the correct space-id token
please pass a valid x-nacelle-space-id header for the requestMissing required header x-nacelle-space-idAdd the header with the correct space-id