DAIL-SQL is a highly effective and efficient approach for optimizing the utilization of LLM on Text-to-SQL. It has proven its superiority by achieving a remarkable score of 86.2% on the Spider leaderboard using GPT-4 during testing.
Notably, it only requires approximately 1600 tokens per question in Spider-dev. It consists mainly of a series of a prompting techniques to improves the LLM's performance.
DAIL-SQL decouples the prompting from the LLM model. Their prompting strategies were tested on a multitude of LLMs:
Its prompting part consist of the following components:
Optionally, DAIL-SQL supports also fine-tuning the model if applicable on the desired dataset to further improve the results.
The objective of DAIL-SQL can be formalised as follows:
Where:
In DAIL-SQL, question representation strategies were surveyed from the literature. The 5 most prominent were compared:
This can improve the model's SQL generation capabilities.
In DAIL-SQL, example selection strategies were also surveyed from the literature. The 5 most prominent were compared:
Basing in these existing approaches, a novel selection method named DAIL Selection was conceptualised.
As usual, the two prominent example organisation were compared:
Also, as a compromise between the two, a novel organisation approach named DAIL Organisation was proposed.