Dail-SQL

#model #prompting #LLM #LM

Introduction

DAIL-SQL is a highly effective and efficient approach for optimizing the utilization of LLM on Text-to-SQL. It has proven its superiority by achieving a remarkable score of 86.2% on the Spider leaderboard using GPT-4 during testing.

Notably, it only requires approximately 1600 tokens per question in Spider-dev. It consists mainly of a series of a prompting techniques to improves the LLM's performance.

DAIL-SQL decouples the prompting from the LLM model. Their prompting strategies were tested on a multitude of LLMs:

  • Proprietary: GPT-4, GPT-3.5
  • Open Source: Llama, Alpaca

Its prompting part consist of the following components:

  1. Question Representation
  2. Example Selection
  3. Example Organisation

Optionally, DAIL-SQL supports also fine-tuning the model if applicable on the desired dataset to further improve the results.

Objective

Notations
  • Let be a set of question representations.
  • Let be a set of example selection methods.
  • Let be a set of example organisation methods.

The objective of DAIL-SQL can be formalised as follows:

Where:

Question Representation

In DAIL-SQL, question representation strategies were surveyed from the literature. The 5 most prominent were compared:

  • Basic Prompt
  • Textual Representation Prompt
  • OpenAI Demonstration Prompt
  • Code Representation Prompt
  • Alpaca Representation Prompt

was chosen due to its expressiveness. In fact, it gives the full information of the schema including primary keys, foreign keys and column types.

This can improve the model's SQL generation capabilities.

Example Selection

In DAIL-SQL, example selection strategies were also surveyed from the literature. The 5 most prominent were compared:

  • Random strategy, used as a baseline.
  • Question Selection Strategy
  • Masked Question Selection Strategy
  • Query Selection Strategy

Basing in these existing approaches, a novel selection method named DAIL Selection was conceptualised.

Example Organisation

As usual, the two prominent example organisation were compared:

  • Full Information Organisation
  • SQL Only Organisation

Also, as a compromise between the two, a novel organisation approach named DAIL Organisation was proposed.