AI Schema Suggestions

Use AI to automatically generate field definitions and mappings from your incoming data.

Overview

When ingesting data from new sources, defining field types, merge operations, and stream mappings can be time-consuming. Lytics provides an AI-powered schema suggestion feature that analyzes sample data and recommends appropriate field definitions and mappings.

The schema suggestion system can:

  • Analyze JSON or CSV sample data from your event streams
  • Recommend field types (string, number, boolean, date, etc.)
  • Identify which fields are likely identifiers or PII
  • Suggest appropriate merge operations for each field
  • Generate stream mapping expressions

How It Works

  1. Data Analysis: You provide sample records from a stream (or the system pulls them automatically from the event catalog).
  2. AI Processing: The sample data is sent to an LLM (Google Vertex AI or OpenAI) that analyzes field names, values, and patterns.
  3. Suggestion Generation: The AI returns structured suggestions for each field, including type, merge operation, PII classification, and mapping expressions.
  4. Review & Apply: You review the suggestions and apply the ones that fit your schema design.

API Reference

Generate Suggestions from Sample Data

POST /v2/ai/schema/suggest

Analyzes sample records and returns field and mapping suggestions.

Query Parameters

ParameterTypeDefaultDescription
enginestringvertexAI engine to use: vertex or openai
streamstringdefaultStream name for the suggestions
tablestringuserTarget table name
formatstringjsonFormat of the sample data: json or csv
promptsstring[]Additional custom instructions for the AI
temperaturefloat1.05Controls randomness of AI output

Request Body

Provide sample records as raw JSON array or CSV data.

[
  {
    "email": "[email protected]",
    "first_name": "Jane",
    "last_name": "Doe",
    "signup_date": "2024-01-15T10:30:00Z",
    "purchase_count": 5,
    "total_spend": 249.99
  }
]

Response

Returns a map of field names to their suggested definitions and mappings:

{
  "email": {
    "fields": [
      {
        "field": "email",
        "is_identifier": true,
        "is_pii": true,
        "shortdesc": "User email address",
        "type": "string",
        "mergeop": "setadd",
        "managed_by": "ai"
      }
    ],
    "mappings": [
      {
        "field": "email",
        "stream": "default",
        "expr": "email",
        "guard_expr": "",
        "managed_by": "ai"
      }
    ]
  },
  "purchase_count": {
    "fields": [
      {
        "field": "purchase_count",
        "is_identifier": false,
        "is_pii": false,
        "shortdesc": "Total number of purchases",
        "type": "int",
        "mergeop": "valuect",
        "managed_by": "ai"
      }
    ],
    "mappings": [
      {
        "field": "purchase_count",
        "stream": "default",
        "expr": "purchase_count",
        "managed_by": "ai"
      }
    ]
  }
}

Retrieve Pre-computed Suggestions for a Stream

GET /v2/ai/stream/{stream}

Returns previously generated field suggestions for a specific stream, if available.

Generate Suggestions for a Specific Field

GET /v2/ai/stream/{stream}/{key}

Generates suggestions for a single field within a stream.

Query Parameters

ParameterTypeDescription
valuesstring[]Sample values for the field. If omitted, values are fetched from the event catalog.

Response

Returns field and mapping suggestions for the specified key, using the same structure as the full suggestion response.

Suggestion Fields

Each field suggestion includes:

PropertyDescription
fieldRecommended field name in the schema
is_identifierWhether the field should be used as an identity key
is_piiWhether the field contains personally identifiable information
shortdescHuman-readable description of the field
typeRecommended data type (string, int, float, boolean, date, etc.)
mergeopRecommended merge operation (setadd, valuect, max, min, etc.)
managed_bySet to ai indicating the suggestion was AI-generated

Each mapping suggestion includes:

PropertyDescription
fieldTarget field in the schema
streamSource stream name
exprMapping expression to extract the value
guard_exprOptional conditional expression for the mapping