Overview

The Schema Simulation endpoint lets you test a schema definition against sample data without persisting anything. It shows step-by-step how raw data would be processed through your field mappings, identity resolution, and merge operations to produce entity records. This is useful for validating schema configurations before applying them to production.

API Reference

Simulate Entity Analysis

POST /v2/simulate-entity-analysis

Processes sample data through a provided schema definition and returns step-wise analysis results.

Request Body

{
  "schema": {
    "fields": [
      {
        "field": "email",
        "type": "string",
        "isIdentifier": true
      },
      {
        "field": "name",
        "type": "string"
      },
      {
        "field": "total_spent",
        "type": "number",
        "mergeOp": "sum"
      }
    ],
    "mappings": [
      {
        "field": "email",
        "stream": "purchases",
        "expr": "email"
      },
      {
        "field": "name",
        "stream": "purchases",
        "expr": "customer_name"
      },
      {
        "field": "total_spent",
        "stream": "purchases",
        "expr": "amount",
        "guard": "amount > 0"
      }
    ],
    "rank": {
      "fields": ["email"]
    }
  },
  "data": [
    {
      "email": "[email protected]",
      "customer_name": "John Doe",
      "amount": 49.99
    },
    {
      "email": "[email protected]",
      "customer_name": "John Doe",
      "amount": 25.00
    }
  ]
}

Schema Object

Field	Type	Description
`fields`	array	Field definitions (at least one required)
`fields[].field`	string	Field name
`fields[].type`	string	Data type (string, number, etc.)
`fields[].isIdentifier`	boolean	Whether this field is used for identity resolution
`fields[].mergeOp`	string	How to merge duplicate values (sum, max, min, etc.)
`fields[].capacity`	int	Maximum number of values to store
`fields[].keepDays`	int	Retention period in days
`mappings`	array	Mapping definitions (at least one required)
`mappings[].field`	string	Target schema field name
`mappings[].stream`	string	Source stream name
`mappings[].expr`	string	LQL expression for value extraction
`mappings[].guard`	string	Optional LQL guard condition
`rank`	object	Identity ranking configuration
`rank.fields`	string[]	Identity fields in priority order

Data Array

An array of data objects to process. Each object is a key-value map representing a raw event. Data items are processed sequentially, building up entity records across iterations.

Response

{
  "step_analyses": [
    [
      {
        "refs": {
          "refs": [
            {"key": "email", "value": "[email protected]"}
          ]
        },
        "keys": [
          {"key": "email", "value": "[email protected]"}
        ],
        "ent": {
          "email": "[email protected]",
          "name": "John Doe",
          "total_spent": 49.99
        },
        "entWithTS": {
          "email": "[email protected]",
          "name": "John Doe",
          "total_spent": 49.99
        },
        "details": []
      }
    ],
    [
      {
        "refs": {
          "refs": [
            {"key": "email", "value": "[email protected]"}
          ]
        },
        "keys": [
          {"key": "email", "value": "[email protected]"}
        ],
        "ent": {
          "email": "[email protected]",
          "name": "John Doe",
          "total_spent": 74.99
        },
        "entWithTS": {
          "email": "[email protected]",
          "name": "John Doe",
          "total_spent": 74.99
        },
        "details": []
      }
    ]
  ]
}

Field	Type	Description
`step_analyses`	array	One entry per input data item
`step_analyses[][]`	array	One analysis object per unique entity identified in that step
`refs`	object	All identity aliases/references for the entity
`keys`	array	Identity key fragments (field name + value pairs)
`ent`	object	The entity as key-value pairs after processing
`entWithTS`	object	The entity with timestamp metadata
`details`	array	Additional analysis details

Error Responses

Status	Error	Cause
400	Schema must contain at least one field	Empty fields array
400	Schema must contain at least one mapping	Empty mappings array
400	Data array cannot be empty	No data objects provided
400	failed building query	Invalid schema configuration (bad LQL, missing fields)
400	failed processing data	Data incompatible with schema definition
400	No analysis results generated	Processing produced no entities

Key Behaviors

In-memory processing: All data is processed in temporary stores. Nothing is persisted to your account.
Sequential processing: Data items are processed one at a time in order, so later items can merge with entities created by earlier items.
Identity resolution: The simulation applies the same identity resolution logic as production, using the provided rank configuration.
Merge operations: Field merge operations (sum, max, min, etc.) are applied when multiple data items resolve to the same entity.
Guard conditions: Mapping guards are evaluated and data is only mapped when the guard expression is true.

Use Cases

Validate field mappings: Confirm that LQL expressions extract the correct values from your data
Test identity resolution: Verify that records are correctly linked across data items
Preview merge behavior: See how merge operations combine values from multiple events
Debug guard conditions: Ensure mapping guards filter data as expected
Schema prototyping: Experiment with schema designs before committing changes