Evaluation methods: An Overview

Metrics

Metric	Smoothing	False Positives	False Negatives	Time Performance	Database Dependency
String Matching	Edit distance	No	Yes	Very Efficient	No
Exact Matching	Mean sub-component matches	No	Yes	Efficient	No
Execution Matching	Jacard similarity	Yes	No	Depends on the database and system	Yes
Query Matching	No	No	No	Uncomputable	No
Test Suites	Mean Jacard similarity	No	Yes	Depends on the databases and system	Yes

Relation between different metrics

Query Matching is in theory the best metric, but it is practically uncomputable. For String Matching it is too restrictive.

We are left with Exact Matching and Execution Matching, we conclude that:

We are slightly inclined towards Execution Matching for the following three reasons:

It does not penalise correct predictions (No False negatives).
For large enough databases, we estimate that false positives are unlikely to happen.
is a metric reflecting real databases. seems to be more theoretical.