BIRD

#dataset #benchmark #EX #VES
Reference

Introduction

BIRD1 (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs and 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.

Unlike previous Text2SQL datasets, BIRD is more similar to real world databases due to the potentially large numbers of rows per table.

Assumptions

Unlike Spider, the Question Clarity assumption is violated. In fact, some questions needs External Knowledge to be answered correctly. Such as:

  • Mathematical and Algorithmic Reasoning: Certain questions may need reasoning to arrive to the query.
    • The algorithmic reasoning may use one of the following SQL functions:
    OVER,JULIANDATE,CAST,ROUND,SUBSTR
    
    • The mathematical reasoning in itself is simple and only requires the algebraic operators
  • Domain Knowledge: Some domain specific knowledge may be required such as financial indicators in Business Intelligence.
  • Synonym knowledge: Some questions may use different words that are synonymous. They should be considered equivalent.
  • Value Representation: The mapping between a word and its representation in the database may not be trivial. For example, the word "center" in a question can be mapped to the value C in a SQL query.

Evaluation

Four main evaluation metrics were used in Spider: