AI Schema Suggestions
Overview
When ingesting data from new sources, defining field types, merge operations, and stream mappings can be time-consuming. Lytics provides an AI-powered schema suggestion feature that analyzes sample data and recommends appropriate field definitions and mappings.
The schema suggestion system can:
- Analyze JSON or CSV sample data from your event streams
- Recommend field types (string, number, boolean, date, etc.)
- Identify which fields are likely identifiers or PII
- Suggest appropriate merge operations for each field
- Generate stream mapping expressions
How It Works
- Data Analysis: You provide sample records from a stream (or the system pulls them automatically from the event catalog).
- AI Processing: The sample data is sent to an LLM (Google Vertex AI or OpenAI) that analyzes field names, values, and patterns.
- Suggestion Generation: The AI returns structured suggestions for each field, including type, merge operation, PII classification, and mapping expressions.
- Review & Apply: You review the suggestions and apply the ones that fit your schema design.
API Reference
Generate Suggestions from Sample Data
POST /v2/ai/schema/suggest
Analyzes sample records and returns field and mapping suggestions.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
engine | string | vertex | AI engine to use: vertex or openai |
stream | string | default | Stream name for the suggestions |
table | string | user | Target table name |
format | string | json | Format of the sample data: json or csv |
prompts | string[] | Additional custom instructions for the AI | |
temperature | float | 1.05 | Controls randomness of AI output |
Request Body
Provide sample records as raw JSON array or CSV data.
[
{
"email": "[email protected]",
"first_name": "Jane",
"last_name": "Doe",
"signup_date": "2024-01-15T10:30:00Z",
"purchase_count": 5,
"total_spend": 249.99
}
]Response
Returns a map of field names to their suggested definitions and mappings:
{
"email": {
"fields": [
{
"field": "email",
"is_identifier": true,
"is_pii": true,
"shortdesc": "User email address",
"type": "string",
"mergeop": "setadd",
"managed_by": "ai"
}
],
"mappings": [
{
"field": "email",
"stream": "default",
"expr": "email",
"guard_expr": "",
"managed_by": "ai"
}
]
},
"purchase_count": {
"fields": [
{
"field": "purchase_count",
"is_identifier": false,
"is_pii": false,
"shortdesc": "Total number of purchases",
"type": "int",
"mergeop": "valuect",
"managed_by": "ai"
}
],
"mappings": [
{
"field": "purchase_count",
"stream": "default",
"expr": "purchase_count",
"managed_by": "ai"
}
]
}
}Retrieve Pre-computed Suggestions for a Stream
GET /v2/ai/stream/{stream}
Returns previously generated field suggestions for a specific stream, if available.
Generate Suggestions for a Specific Field
GET /v2/ai/stream/{stream}/{key}
Generates suggestions for a single field within a stream.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
values | string[] | Sample values for the field. If omitted, values are fetched from the event catalog. |
Response
Returns field and mapping suggestions for the specified key, using the same structure as the full suggestion response.
Suggestion Fields
Each field suggestion includes:
| Property | Description |
|---|---|
field | Recommended field name in the schema |
is_identifier | Whether the field should be used as an identity key |
is_pii | Whether the field contains personally identifiable information |
shortdesc | Human-readable description of the field |
type | Recommended data type (string, int, float, boolean, date, etc.) |
mergeop | Recommended merge operation (setadd, valuect, max, min, etc.) |
managed_by | Set to ai indicating the suggestion was AI-generated |
Each mapping suggestion includes:
| Property | Description |
|---|---|
field | Target field in the schema |
stream | Source stream name |
expr | Mapping expression to extract the value |
guard_expr | Optional conditional expression for the mapping |