WikiSQL is a large crowd-sourced dataset for developing natural language interfaces for relational databases.
WikiSQL is a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia.
To summarise, It is guaranteed that the ground-truth SQL query is of the following form
OPT-AGG (SELECT COL FROM TABLENAME
WHERE CONDITIONS
)
with:
OPT-AGG
one of MAX
, MIN
, COUNT
, SUM
or nothing.COL
is a column nameTABLENAME
is the table name.CONDITIONS
are list of conditions in the following BNF form:
CONDITIONS ::= CONDITION | CONDITIONS OP CONDITION
OP ::= OR | AND
CONDITION ::= TOKEN CMP TOKEN
CMP ::= > | < | <> | >= | <= | ==
Three main evaluation metrics were used in Spider: