Question Representation

#prompting #representation #LM #LLM #zero-shots

Introduction

It has been established that the representation of a task is crucial for language models: It may directly influence the latter's performances.

For that reason, in Text2SQL, many methods were implemented to improve the LLM's performances.

Scope

Unlike what its name suggests, Question Representation encapsulates both the representation of the database schema, and the target question.

Question Representation does not include examples for the prompt, and thus it is considered a zero-shot prompting technique.

While question representation does not include additional examples. But it is usually augmented with both example selection and example organisation methods.

Also, additional database related metadata may be included as described in database representation.

Basic Prompt

It is represented by It represents the schema by only listing the tables and columns.

Table continents , columns = [ ContId , Continent ]
Table countries , columns = [ CountryId , CountryName ,
Continent ]
Q : How many continents are there ?
A : SELECT

Textual Representation

It is represented by It represents the schema by only its tables and columns, as in Basic Prompt. But the input and the target are clarified by additional phrases:

Given the following database schema :
continents : ContId , Continent
countries : CountryId , CountryName , Continent
Answer the following : How many continents are there ?
SELECT

OpenAI Demonstration Prompt

It is represented by It is was first proposed by OpenAI.

DAIL-SQL Version

### Complete sqlite SQL query only and with no explanation
### SQLite SQL tables , with their properties :
#
# continents ( ContId , Continent )
# countries ( CountryId , CountryName , Continent )
#
### How many continents are there ?
SELECT

Current Version

#SYSTEM
#Given the following SQL tables, your job is to write queries given a user’s request.          
CREATE TABLE Orders (       
		OrderID int,       
		CustomerID int,       
		OrderDate datetime,       
		OrderTime varchar(8),       
		PRIMARY KEY (OrderID)     
		);          
		
CREATE TABLE OrderDetails (       
		OrderDetailID int,       
		OrderID int,       
		ProductID int,       
		Quantity int,       
		PRIMARY KEY (OrderDetailID)     
		);          
		
CREATE TABLE Products (       
		ProductID int,       
		ProductName varchar(50),       
		Category varchar(50),       
		UnitPrice decimal(10, 2),       
		Stock int,       
		PRIMARY KEY (ProductID)     
		);          

CREATE TABLE Customers (       
		CustomerID int,       
		FirstName varchar(50),       
		LastName varchar(50),       
		Email varchar(100),       
		Phone varchar(20),       
		PRIMARY KEY (CustomerID)     
		);
#USER
Write a SQL query which computes the average total order value for all orders on 2023-04-01.

Code Representation Prompt

It is denoted by It represents the database schema as SQL commands.
The input and targets are clarified by SQL multi-line comment.

In theory, is designed to be a valid SQL transaction.

/* Given the following database schema : */
CREATE TABLE continents (
ContId int primary key ,
Continent text ,
---foreign key ( ContId ) references countries ( Continent )
);
CREATE TABLE countries (
CountryId int primary key ,
CountryName text ,
Continent int ,
foreign key ( Continent ) references continents ( ContId )
);
/* Answer the following : How many continents are there ?
*/
SELECT

Alpaca SFT Prompt

It is denoted by This prompt was designed for supervised fine-tuning.

Below is an instruction that describes a task , paired with an input that provides further context . Write a response that appropriately completes the request.
### Instruction :
Write a sql to answer the question " How many continents are there ? "
### Input :
continents ( ContId , Continent )
countries ( CountryId , CountryName , Continent )
### Response :
SELECT