Data Streams
Introduction
A data stream is a continuous flow of data generated from various sources such as websites, mobile apps, email providers, social media, and other digital platforms. Data streams typically involve high volumes of data generated in real-time or near-real-time, making it necessary to process and analyze the data as it arrives.
Lytics allows for creating any number of data streams to provide logical distinctions among data sources in either type or scope. For example, data describing email subscriber attributes should exist on a different data stream than one describing email subscriber activity. This ensures maximum flexibility when defining the relationships between sources and how they ultimately are unified into a materialized user profile.
What is an "event?"
In Lytics, an "event" is an action or activity performed by a user or a customer, such as visiting a website, purchasing, or subscribing to a newsletter. As pictured below, events are showcased as independent key/value pairs on a single data stream. These keys
are then translated into Fields & Mappings in order to materialize to user profile for segmentation.
Exploring Data Streams
Lytics will ingest data that is sent to one of Lytics' Collection APIs: the collect API or the bulk API. The bulk API is generally intended for larger imports of offline data, while the collect API is generally intended for more real-time sources and usage.
Viewing Your Data Streams
You can view information about your data streams in your Lytics dashboard by navigating to Conductor > Pipeline > Streams. The primary purpose of this section is to verify that data is successfully being received by Lytics. If your account has multiple data streams, you can view a different stream from the dropdown menu above the graph.
Many integrations have multiple streams. For instance, it is common for email integrations to have an activity stream and a user stream. Integration streams should be prefixed to help identify the source. You can find the streams for integration under the documentation for that integration.
The number of keys in a data stream can be extremely large. This is normal and does not impact performance. Any keys that comprise less than 0.1% of the data volume OR have not been seen in 7 days will be hidden from display to reduce clutter.
Event Ingress Graph
The event ingress graph shows the number of events collected on a stream for the selected time period (past day, week, month, 3 months, and year) and interval (hourly, daily, weekly, and monthly). Above the graph, you will find the time the last message from this stream was received, the source of the data stream, and the number of fields in the stream.
Last message received strives for real time reporting but can lag under a number of conditions including during bulk imports. If a data stream is not updating as expected please contact Lytics Support for assistance.
Raw Keys Table
Below the event ingress graph is the raw keys table. An event may contain any number of key-value pairs. Each record in this table represents a unique raw key seen on the stream in at least one event.
The table has the following information on keys:
Column | Description |
---|---|
Name | The name of the key. |
Predicted Type | The assumed data type for the value is determined by sampling the values received. |
First Seen | The date the key was first seen, according to the date of the event. |
Last Seen | The last time a key was seen, according to the date of the event. |
Times Seen | The number of events that contained the key. |
Unique Values | The number of different values seen. |
Times Used | The number of user fields that use the key. |
In addition to these seven columns, each record in the table can be clicked to open up a set of sample values. This can be used to verify that values are being collected and they match the expected data.
If a key has many different values, the modal may not display all the values for the chosen key.
The table can be filtered in three ways: used vs. unused, common vs. uncommon, and text search.
Filter | Description |
---|---|
Used | A raw key that is mapped to a user field. |
Unused | Raw keys are collected and stored but never mapped to user fields. |
Common | Raw keys that have been seen more often on events relative to other raw keys based on the times seen value. |
Uncommon | Raw keys are seldom seen on events relative to other raw keys. |
Hiding Keys
Raw event keys can be hidden, but it is important to note that keys cannot be made visible again through the user interface - only through the API. It is recommended that a list is kept of hidden keys in the event one needs to be resurfaced at a later date. If you need assistance, please contact Lytics Support with your key name and account ID.
To hide a key:
- Select the checkbox next to the name of the key or keys.
- Click Remove selected key.
Final Thoughts
Having logically differentiated data streams for different data sources also helps to facilitate a more straightforward process for mapping data from data sources to user profiles. However, integrating data from some data providers can make this distinction difficult. For example, data routers like Segment or Rudderstack can route data from multiple sources into a single destination, like Lytics.
Updated 8 months ago