Query Matching

This metric is only a theoretical metric.

Quick-Start

Two queries are said to be semantically equivalent if they give the same results on all possible databases. In other words, they are indistinguishable.

calculates the fraction of predicted SQL queries that are semantically equivalent to their respective ground truth queries.

Formula

Notations:

Equivalence

and are said to be semantically equivalent if:

We will denote it by:

The Query matching accuracy of between and is defined as follows:

Short-Comings

Undecidability

In theory, is the metric that describes best whether a query prediction is correct or not. But generally, it cannot be computed:

If the SQL language supports arithmetic operations, then its undecidability is a result of Hilbert's 10th problem.
Otherwise, if the SQL language supports negations, then it is still undecidable^[1].
Even we remove both, the equivalence problem is NP-Complete^[1-1].

With that, this metric is not practical, and we propose the other metrics as an approximation of

If SQL queries are of type 3, it can be argued that the size of the query is generally small. If that is the case, and implementation of query similarity is found in COSETTE^[2].

Notes & References

Abiteboul, Serge, R. Hull, and V. Vianu. Foundations of databases. Vol. 8. Reading: Addison-Wesley, 1995.↩︎↩︎
S. Chu, B. Murphy, J. Roesch, A. Cheung, D. Suciu. Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries↩︎