Exact Matching

#evaluation #EM

Quick-start

The exact matching score is defined as the fraction of predictions that have an exact matching with the respective ground truth query.

Exact Matching was introduced as a refinement of String Matching on which commutativity of operators is used to reduce false negatives[1].

Exact Matching

For each query, we will build its syntax tree. The syntax tree should be consistent with the construct of the SQL language.

A node is marked as green if its children can be rearranged without changing the overall query. Otherwise it is marked as red.

Green Nodes include:
  • UNION, INTERSECT keywords
  • JOIN, NATURAL JOIN, OUTER JOIN keywords
  • SELECT, HAVING keywords
  • Multiplication operator *
  • Addition operator +
  • AND
  • OR

With the Green-Red decomposition, We will rearrange the syntax tree as follows:

def rearrange(tree):
	for child in tree.children: 
		rearrange(child)
	if colour(tree) == "green":
		tree.children.sort(key=str)

Example 1

Formula

The exact matching accuracy between and is defined as:

Advantages

  • Exact matching does not suffer from false positives. That is, semantically different queries can only have
  • SQL queries can be parsed in linear time[1-1]. rearrange can be calculated in linearithmic time[2]. And so we can calculate the exact matching between two queries in linearithmic time. Which is considered efficient.

Short-comings

  • Its implementation can get very tricky.
  • Exact Matching is susceptible to false negatives. That is, two semantically equivalent queries can have By consequence underestimates the semantic matching

Example 2

Notes & References


  1. As a function of the size of the query. In other words the number of its characters.↩︎↩︎
  2. where is the size of the query.↩︎