AI SQL Generation

Generate SQL queries from natural language descriptions for your connected data sources.

Overview

Lytics can generate valid SQL queries from natural language descriptions for any of your connected database sources. This feature is useful when you need to query connected data warehouses but want to avoid writing SQL by hand.

The system uses embeddings to understand your database schema and selects the most relevant tables for your query, then generates syntactically correct SQL tailored to the specific database connection.

How It Works

Schema Embedding: When you enable SQL generation for a connection, Lytics computes vector embeddings of your table schemas. These embeddings capture the semantic meaning of table and column names.
Query Understanding: When you submit a natural language query, it is converted into an embedding and compared against the stored table embeddings using cosine similarity.
Table Selection: The top 5 most relevant tables are selected based on similarity, and their schemas are included as context.
SQL Generation: An LLM generates a syntactically valid SQL query based on your natural language input and the relevant table schemas.

API Reference

Generate SQL from Natural Language

POST /v2/ai/sql/{connection}

Translates a natural language description into a SQL query for the specified database connection.

Path Parameters

Parameter	Description
`connection`	The ID of the database connection to query against

Query Parameters

Parameter	Type	Default	Description
`engine`	string	`vertex`	AI engine: `vertex` or `openai`

Request Body

{
  "text": "Show me the top 10 customers by total purchase amount in the last 30 days"
}

Response

{
  "query": "SELECT customer_id, SUM(amount) as total_amount FROM orders WHERE order_date >= DATEADD(day, -30, GETDATE()) GROUP BY customer_id ORDER BY total_amount DESC LIMIT 10"
}

Pre-compute Schema Embeddings

POST /v2/ai/embedding

Kicks off a background job to compute embeddings for table schemas across one or more connections. This step improves query generation speed and relevance.

Request Body

{
  "connections": ["connection_id_1", "connection_id_2"]
}

Response

Returns HTTP 201 (Created) with no body. The embedding job runs asynchronously.

📘
Embeddings are cached and reused if the underlying table schema hasn't changed. You only need to recompute embeddings when your database schema is modified.

Supported Engines

Engine	Model	Notes
`vertex`	Google Vertex AI	Default engine. Uses code generation models.
`openai`	GPT-3.5 Turbo	Alternative engine.

Prerequisites

A configured database connection with valid credentials
The connection must be accessible from the Lytics platform
For best results, pre-compute schema embeddings before generating queries

Updated 22 days ago