Exact Matching

Quick-start

The exact matching score is defined as the fraction of predictions that have an exact matching with the respective ground truth query.

Exact Matching was introduced as a refinement of String Matching on which commutativity of operators is used to reduce false negatives^[1].

For each query, we will build its syntax tree. The syntax tree should be consistent with the construct of the SQL language.

A node is marked as green if its children can be rearranged without changing the overall query. Otherwise it is marked as red.

Green Nodes include:

With the Green-Red decomposition, We will rearrange the syntax tree as follows:

def rearrange(tree):
	for child in tree.children: 
		rearrange(child)
	if colour(tree) == "green":
		tree.children.sort(key=str)

Notations

The exact matching accuracy between and is defined as:

Exact matching does not suffer from false positives. That is, semantically different queries can only have
SQL queries can be parsed in linear time^[1-1]. rearrange can be calculated in linearithmic time^[2]. And so we can calculate the exact matching between two queries in linearithmic time. Which is considered efficient.

Its implementation can get very tricky.
Exact Matching is susceptible to false negatives. That is, two semantically equivalent queries can have By consequence underestimates the semantic matching

As a function of the size of the query. In other words the number of its characters.↩︎↩︎
where is the size of the query.↩︎