Evaluation methods: An Overview

#evaluation #EM #SM #EX #TS

Metrics

Metric Smoothing False Positives False Negatives Time Performance Database Dependency
String Matching Edit distance No Yes Very Efficient No
Exact Matching Mean sub-component matches No Yes Efficient No
Execution Matching Jacard similarity Yes No Depends on the database and system Yes
Query Matching No No No Uncomputable No
Test Suites Mean Jacard similarity No Yes Depends on the databases and system Yes
Relation between different metrics

Conclusion

Query Matching is in theory the best metric, but it is practically uncomputable. For String Matching it is too restrictive.

We are left with Exact Matching and Execution Matching, we conclude that:

  • Exact Matching is best in scenarios when the database is small or inexistent.
  • Execution Matching is best in scenarios when the database has enough rows.

We are slightly inclined towards Execution Matching for the following three reasons:

  • It does not penalise correct predictions (No False negatives).
  • For large enough databases, we estimate that false positives are unlikely to happen.
  • is a metric reflecting real databases. seems to be more theoretical.