C3

#model #prompting #LLM #schema-linking #LM

Introduction

Usually, Zero-shot prompts under-performs in Text2SQL tasks.

C3 is a set of prompt methods on ChatGPT to improve model performances. It consists of:

Clear Prompting: In which an effective prompt will be constructed from both the schema and the question.
Calibration of Model Bias: A strategy used to avoid some biases present in the model
Consistency Output: A method to stabilise the output.

Prompt

C3 focuses on many prompting methods to improve performances. The considered methods are:

Question Representation
Schema Linking
Calibration of Model Bias
Consistency Output

Question Representation

All components of the prompt should be separated. Such example is the OpenAI prompt.
C3 uses the OpenAI prompt with a schema-linking to use only most relevant tables and columns

Schema Linking

This is achieved as follows:

Table Recall

It uses a zero-shot prompt to instruct the LLM to recall tables using three steps

The tables should be ranked based on their relevance to the question.
The model should check if all tables have been considered.
The output format is specified as a list.

Column Recall

It also uses a zero-shot prompt to instruct the LLM to recall tables using two steps:

All columns within each candidate table are ranked based on their relevance to the question.
The output format is specified as a dictionary

Calibration with Hints

Through analyzing the errors that occurred in the generated SQL queries, some errors were founded are caused by certain biases inherent in ChatGPT.

To calibrate these biases, A calibration strategy plugin was proposed, incorporates prior knowledge into ChatGPT by using contextual prompts which include historical conversations.

In the historical conversation, ChatGPT will be assumed as an excellent SQL writer and thus will be guided it to follow the debias hints.

Bias 1:

Warning

ChatGPT tends to be conservative in its output and often selects columns that are relevant to the question but not necessarily required.

To mitigate the first bias, a tip was designed to guide ChatGPT in selecting only the necessary column.

Bias 2:

Warning

ChatGPT tends to use LEFT JOIN, OR and IN when writing SQL queries, but often fails to use them correctly. This bias often leads to extra values in execution results.

For the second bias, a tip was designed to prevent ChatGPT from misusing SQL keywords, ChatGPT will be explicitly asked to:

Avoid using LEFT JOIN, IN and OR, and use JOIN and INTERSECT instead.
Use DISTINCT or LIMIT when appropriate to avoid repetitive execution results.

Consistency Output

To enhance the consistency, self-consistency was used. It is motivated by the fact that in complex reasoning problems, there are multiple different reasoning paths to the unique right answer.

It first samples multiple different reasoning paths and then selects the most consistent answer to improve the quality of the output remarkably.