Spider is a large-scale, complex and cross-domain semantic parsing and Text2SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables, covering 138 different domains.
The goal of Spider is converting the textual representation to an SQL query.
To create Spider, some assumptions on the dataset were made:
Also, to avoid overfitting over a particular database. It is guaranteed that training, dev, and test do not share a database.
To better understand the model performance on different queries, we divide SQL queries into 4 levels: easy, medium, hard, extra hard. We define the difficulty as the following:
First, we define:
WHERE, GROUP BY, ORDER BY, LIMIT, JOIN, OR, LIKE,HAVING
Then different hardness levels are determined as follows.
Four main evaluation metrics were used in Spider:
In order to know model's performance on different SQL components, we provide the detailed scores on each part. Since models in our paper do not predict value string, our Partial and Exact Matching evaluation metrics do not take value strings into account.
For each SQL, we compute accuracy and F1 scores for all following components:
SELECT COLUMN
: e.g. gold: ([select, col1, none], [select, col2, max]) and predicted: ([select, col1, none], [select, col3, min]) compute accuracy, recall, precision and F1 scores.SELECT COLUMN WITHOUT AGG
: e.g. gold: ([select, col1], [select, col2]) and predicted: ([select, col1], [select, col3]) compute accuracy, recall, precision and F1 scores.WHERE COLUMN
: ([where, col4, NOT IN, NESTED SQL], [where, col1, >=, novalue], [where, col2, =, novalue])WHERE COLUMN WITHOUT OP
: ([where, col1], [where, col4])GROUP BY
: ([groupby, col2], [groupby, col5])GROUP BY HAVING
: ([groupby, col2, having col1, count, >=])ORDER BY
: ([orderby, col1, no agg, desc, no limit], [orderby, *, count, asc, 3])AND/OR
: ([where, col1, col2, and], [where, col3, col2, or])EXCEPT, UNION, INTERSECT, NESTED SQL
: get the except/union/intersect/nested
part in all SQLs containing except/union/intersect/nested
, check if predicted except/union/intersect/nested
part equals to the gold except/union/intersect/nested part
.SQL KEY WORDS
: for gold and predicted sql, create a set of SQL key words if they are in [where, group by, having, desc, asc, order by, limit, except, union, intersect, not in, in, or, like]
.In Spider, SQL matching is done component-wise using the SQL skeletons. If the predicted result gets all SQL parts right, then the score of Exact Matching without Values for this predicted example is 1, otherwise 0.