Graphix-T5

#model #GNN #T5

Introduction

Graphix-T5[1] is a state of the art GNN[2] model for Text2SQL. It is currently the open source model with the best exact match on spider.

It modifies the T5[3] encoder by defining GNN layers.

Graphix center

Graph Construction

Incoming Technicalities

The reader may need a basic knowledge of Graph Theory concepts to understand the construction process.

Graphix-T5 encodes all question tokens and all table and columns in a graph. In that graph, edges denote relation between different words, columns and tables.

Fundamentally, the graph can be divided into 3 main components:

  1. Schema Graph: Which is the subgraph encoding the relation between columns and tables.
  2. Question Graph: Which is the subgraph encoding the relation between different words in the question
  3. Schema Linking: Which joins both graphs by relating question nodes and schema nodes.

Schema Graph

Given a database schema A schema graph will be constructed as follows:

  1. with the tables, and the columns of
  2. a set of relations(edges) generated as follows:
    1. if is a column of
    2. if is the primary key of
    3. if is a foreign key of

Question Graph

Given a question , its graph will be constructed as follows:

  1. are the tokens of
  2. a set of relations(edges) generated by and , with both are implicit dependency relations between tokens in a question.

Schema Linking

The input graph is constructed as follows:

  1. with the schema linking relations defined as:
  2. are generated by the implicit relations between question nodes and schema nodes.

Graphix Layer

Graphix-T5 incorporate two layers:

Incoming Technicalities

Graphix-T5 is essentially a modification of the T5 architecture.

For that reason, a descent understanding on Transformers in general and T5 in particular is recommended for the subsequent sections.

Semantic Layer

The semantic representations of hidden states are firstly encoded by a Transformer block, which contains two important components, including Multi-head Self-attention Network (MHA) and Fully-connected Forward Network (FFN).

MHA

Attention Layer

Attention layer maps query matrix key matrix and value matrix via the relation:

MHA calculates the attention outputs for each head and concatenate them as following:

FFN

With the FFN layer is applied as follows:

Normalisation

Finally, the semantic values are extracted with an additional row-wise normalisation:

Structural Layer

In each Graphix Layer, structural representations are produced through the relational graph attention network, it is formalised as follows:

Notes & References


  1. Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, & Yongbin Li. (2023). Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing.↩︎
  2. Graph Neural Network↩︎
  3. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, & Peter J. Liu. (2023). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.will be↩︎