Access Type

Open Access Dissertation

Date of Award

January 2012

Degree Type


Degree Name



Computer Science

First Advisor

Farshad Fotouhi

Second Advisor

Shiyong Lu


Provenance, the metadata that records the derivation history of scientific results, is important in scientific workflows to interpret, validate, and analyze the result of scientific computing. Recently,

to promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) has been proposed and has played an important role in the community.

In this dissertation, to efficiently query and manage OPM-compliant provenance, we first propose a provenance collection framework that collects both prospective provenance, which captures

an abstract workflow specification as a recipe for future data derivation and retrospective provenance, which captures past workflow execution and data derivation information. We then

propose a relational database-based provenance system, called OPMPROV that stores, reasons, and queries prospective and retrospective provenance, which is OPM-compliant provenance. We finally propose OPQL, an OPM-level provenance query language, that is directly defined over the OPM model. An OPQL query takes an

OPM graph as input and produces an OPM graph as output; therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our provenance store, provenance collection framework, and provenance query language feature the native support of the OPM model.