Schema Simulation

Test schema definitions against sample data to preview how entities will be built.

Overview

The Schema Simulation endpoint lets you test a schema definition against sample data without persisting anything. It shows step-by-step how raw data would be processed through your field mappings, identity resolution, and merge operations to produce entity records. This is useful for validating schema configurations before applying them to production.

API Reference

Simulate Entity Analysis

POST /v2/simulate-entity-analysis

Processes sample data through a provided schema definition and returns step-wise analysis results.

Request Body

{
  "schema": {
    "fields": [
      {
        "field": "email",
        "type": "string",
        "isIdentifier": true
      },
      {
        "field": "name",
        "type": "string"
      },
      {
        "field": "total_spent",
        "type": "number",
        "mergeOp": "sum"
      }
    ],
    "mappings": [
      {
        "field": "email",
        "stream": "purchases",
        "expr": "email"
      },
      {
        "field": "name",
        "stream": "purchases",
        "expr": "customer_name"
      },
      {
        "field": "total_spent",
        "stream": "purchases",
        "expr": "amount",
        "guard": "amount > 0"
      }
    ],
    "rank": {
      "fields": ["email"]
    }
  },
  "data": [
    {
      "email": "[email protected]",
      "customer_name": "John Doe",
      "amount": 49.99
    },
    {
      "email": "[email protected]",
      "customer_name": "John Doe",
      "amount": 25.00
    }
  ]
}

Schema Object

FieldTypeDescription
fieldsarrayField definitions (at least one required)
fields[].fieldstringField name
fields[].typestringData type (string, number, etc.)
fields[].isIdentifierbooleanWhether this field is used for identity resolution
fields[].mergeOpstringHow to merge duplicate values (sum, max, min, etc.)
fields[].capacityintMaximum number of values to store
fields[].keepDaysintRetention period in days
mappingsarrayMapping definitions (at least one required)
mappings[].fieldstringTarget schema field name
mappings[].streamstringSource stream name
mappings[].exprstringLQL expression for value extraction
mappings[].guardstringOptional LQL guard condition
rankobjectIdentity ranking configuration
rank.fieldsstring[]Identity fields in priority order

Data Array

An array of data objects to process. Each object is a key-value map representing a raw event. Data items are processed sequentially, building up entity records across iterations.

Response

{
  "step_analyses": [
    [
      {
        "refs": {
          "refs": [
            {"key": "email", "value": "[email protected]"}
          ]
        },
        "keys": [
          {"key": "email", "value": "[email protected]"}
        ],
        "ent": {
          "email": "[email protected]",
          "name": "John Doe",
          "total_spent": 49.99
        },
        "entWithTS": {
          "email": "[email protected]",
          "name": "John Doe",
          "total_spent": 49.99
        },
        "details": []
      }
    ],
    [
      {
        "refs": {
          "refs": [
            {"key": "email", "value": "[email protected]"}
          ]
        },
        "keys": [
          {"key": "email", "value": "[email protected]"}
        ],
        "ent": {
          "email": "[email protected]",
          "name": "John Doe",
          "total_spent": 74.99
        },
        "entWithTS": {
          "email": "[email protected]",
          "name": "John Doe",
          "total_spent": 74.99
        },
        "details": []
      }
    ]
  ]
}
FieldTypeDescription
step_analysesarrayOne entry per input data item
step_analyses[][]arrayOne analysis object per unique entity identified in that step
refsobjectAll identity aliases/references for the entity
keysarrayIdentity key fragments (field name + value pairs)
entobjectThe entity as key-value pairs after processing
entWithTSobjectThe entity with timestamp metadata
detailsarrayAdditional analysis details

Error Responses

StatusErrorCause
400Schema must contain at least one fieldEmpty fields array
400Schema must contain at least one mappingEmpty mappings array
400Data array cannot be emptyNo data objects provided
400failed building queryInvalid schema configuration (bad LQL, missing fields)
400failed processing dataData incompatible with schema definition
400No analysis results generatedProcessing produced no entities

Key Behaviors

  • In-memory processing: All data is processed in temporary stores. Nothing is persisted to your account.
  • Sequential processing: Data items are processed one at a time in order, so later items can merge with entities created by earlier items.
  • Identity resolution: The simulation applies the same identity resolution logic as production, using the provided rank configuration.
  • Merge operations: Field merge operations (sum, max, min, etc.) are applied when multiple data items resolve to the same entity.
  • Guard conditions: Mapping guards are evaluated and data is only mapped when the guard expression is true.

Use Cases

  • Validate field mappings: Confirm that LQL expressions extract the correct values from your data
  • Test identity resolution: Verify that records are correctly linked across data items
  • Preview merge behavior: See how merge operations combine values from multiple events
  • Debug guard conditions: Ensure mapping guards filter data as expected
  • Schema prototyping: Experiment with schema designs before committing changes