Usually, Zero-shot prompts under-performs in Text2SQL tasks.
C3 is a set of prompt methods on ChatGPT to improve model performances. It consists of:
C3 focuses on many prompting methods to improve performances. The considered methods are:
All components of the prompt should be separated. Such example is the OpenAI prompt.
C3 uses the OpenAI prompt with a schema-linking to use only most relevant tables and columns
This is achieved as follows:
It uses a zero-shot prompt to instruct the LLM to recall tables using three steps
It also uses a zero-shot prompt to instruct the LLM to recall tables using two steps:
Through analyzing the errors that occurred in the generated SQL queries, some errors were founded are caused by certain biases inherent in ChatGPT.
To calibrate these biases, A calibration strategy plugin was proposed, incorporates prior knowledge into ChatGPT by using contextual prompts which include historical conversations.
In the historical conversation, ChatGPT will be assumed as an excellent SQL writer and thus will be guided it to follow the debias hints.
ChatGPT tends to be conservative in its output and often selects columns that are relevant to the question but not necessarily required.
To mitigate the first bias, a tip was designed to guide ChatGPT in selecting only the necessary column.
ChatGPT tends to use LEFT JOIN, OR and IN when writing SQL queries, but often fails to use them correctly. This bias often leads to extra values in execution results.
For the second bias, a tip was designed to prevent ChatGPT from misusing SQL keywords, ChatGPT will be explicitly asked to:
To enhance the consistency, self-consistency was used. It is motivated by the fact that in complex reasoning problems, there are multiple different reasoning paths to the unique right answer.
It first samples multiple different reasoning paths and then selects the most consistent answer to improve the quality of the output remarkably.