Access Type

Open Access Thesis

Date of Award

January 2015

Degree Type


Degree Name



Computer Science

First Advisor

Marwan Abi-Antoun


Evaluating programming-language based techniques is crucial to judge their usefulness in practice but requires a careful selection of systems on

which to evaluate the technique. Since it is particularly hard to evaluate a heavyweight technique, such as one that requires adding annotations

to the code or rewriting the system in a radically different language, it is common to use a lightweight proxy to predict the technique's usefulness

for a system. But the reliability of such a proxy is unclear.

We propose a principled data-driven approach to derive a lightweight proxy for a heavyweight technique that requires adding annotations to the code.

The approach involves the following: computing metrics (DiffMetrics) that measure differences between a system representation (e.g., the code structure)

and the system representation extracted by the heavyweight technique (e.g., abstraction of the runtime structure); identifying the outliers of the

DiffMetrics; identifying code patterns and classifying the outliers based on the identified code patterns; implementing visitors that look for the

code patterns on systems with no annotations; identifying code metrics that correlate strongly with the DiffMetrics. For a new system with no annotations,

a proxy predicts if the heavyweight technique may be useful based on the results from the visitors and the code metrics.

To evaluate the approach, we run the visitors and compute code metrics on four systems that were previously not analyzed. The proxy predicts that the

heavyweight technique may be useful two of the systems. Thus, the abstract runtime structure may be significantly different from the code structure for

those systems. To validate the proxy's predictions, we run the heavyweight technique on the two systems to confirm the predictions.

Such a principled approach is reusable and can be applied on any programming-language based technique to identify systems for evaluation and for a

better understanding the types of systems for which a technique is most useful.