Benchmarks

#benchmark

WikiSQL

Rank Model Date Score
🏆1 SeaD+Execution-Guided Decoding 2021/05 93.0
🥈2 SDSQL+Execution-Guided Decoding 2021/03 92.7
🥉3 IE-SQL+Execution-Guided Decoding 2020/11 92.5
4 HydraNet+Execution-Guided Decoding 2020/03 92.2
5 BRIDGE+Execution-Guided Decoding 2020/12 91.9
6 X-SQL+Execution-Guided Decoding 2019/08 91.8
7 SDSQL 2021/03 91.4
8 BRIDGE 2020/12 91.1
9 Text2SQLGen + EG 2021/04 91.0
10 SeqGenSQL+EG 2020/11 90.5

Spider

Execution Accuracy

Rank Model
Date Score
🏆1 MiniSeek 2023/11 91.2
🥈2
DAIL-SQL + GPT-4 + Self-Consistency
2023/08 86.6
🥉3
DAIL-SQL + GPT-4
2023/08 86.2
4 DPG-SQL + GPT-4 + Self-Correction 2023/10 85.6
5 DIN-SQL+GPT4 2023/04 85.3
6 Hindsight Chain of Thought with GPT-4 2023/07 83.9
7 C3 + ChatGPT + Zero-Shot 2023/06 82.3
8 Hindsight Chain of Thought
with GPT-4 and Instructions
2023/07 80.8
9 RESDSQL-3B + NatSQ 2023/02 79.9
10 SeaD + PQL 2022/11 78.5

Exact Match Accuracy

Model
Date Score
🏆1 MiniSeek 2023/11 81.5
🥈2 Graphix-3B + PICARD 2022/09 74.0
🥉3 CatSQL + GraPPa 2022/09 73.9
4 SHiP + PICARD 2022/09 73.1
5 G³R + LGESQL + ELECTRA 2022/05 72.9
6 RESDSQL+T5-1.1-lm100k-xl 2022/08 72.4
7 T5-SR 2022/05 72.4
7 N-best List Rerankers + PICARD 2022/12 72.2
9 S²SQL + ELECTRA 2021/09 72.1
10 RESDSQL-3B + NatSQL 2023/02 72.0

BIRD

Execution Accuracy

Rank Model Date Score
🧑‍💻 Human Performance
Data Engineers + DB Students
N/A 92.96
🏆1 MCS-SQL + GPT-4 2024/01 65.45
🥈2 PB-SQL v1 2024/02 64.84
🥉3 SENSE 13B 2024/02 63.39
4 Chat2Query 2024/03 60.98
5 Dubo-SQL-v1 2023/11 60.71
6 SFT CodeS-15B 2023/10 60.37
7 DTS-SQL + DeepSeek 7B 2024/02 60.31
8 MAC-SQL + GPT-4 2023/11 59.59
9 SFT CodeS-7B 2023/10 59.25
10 DAIL-SQL + GPT-4 2023/11 57.41

Valid Efficiency Score

Rank Model Date Score
🏆1 MCS-SQL + GPT-4 2024/01 71.35
🥈2 PB-SQL 2024/02 68.90
🥉3 MAC-SQL + GPT-4 2023/11 67.68
4 DTS-SQL + DeepSeek 7B 2024/02 64.52
5 SFT CodeS-15B 2023/10 64.22
6 Chat2Query 2024/03 63.89
7 SFT CodeS-7B 2023/10 63.62
8 Dubo-SQL-v1 2023/11 63.00
9 DAIL-SQL + GPT-4 2023/08 61.95
10 GPT-4 2023/07 60.77