Provenance – Introduction

Everything changes rapidly and it is difficult to trace back to see its originality. For example, the content of a blog that was previously contributed by known users, maybe updated by unknown user now. Provenance comes with a concept to overcome such enactment. According to the Oxford English Dictionary, Provenance is defined as:

  • the fact of coming from some particular source or quarter; origin, derivation.
  • the history or pedigree of a work of art, manuscript, rare book, etc.; concr., a record of the ultimate derivation and passage of an item through its various owners.

Another definition of Provenance is a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing.Whatever the definition is, the main concern of Provenance is data identification and derivation which are able to explain how specific result has derived and the processes contribute to its existence in a certain stage. This concern helps user better identify the process and reach particular explanation. In addition, Provenance can also increase productivity since it helps investigators identifies the particular derivation which save time, computational effort and storage.

Provenance can be viewed as 2 types, namely WHERE provenance and WHY provenance. Where provenance deals with the origin of the data and its associated components, while Why provenance explains a causal effect of the data. Another authors also support this by adding 5 more elements which make all 7 interconnected elements, namely “what”, “when”, “where”, “how”, “who”, “which” and “why”. Each of those plays its particular role. WHAT denotes the sequence of events that affect the data object; WHEN, the set of event time; WHERE, the set of all locations; HOW, the set of all actions leading up to the events; WHO, the set of all agents involved in the events; WHICH, the set of all devices; WHY, the set of reasons for the events.

One should be kept in mind, Provenance should be domain-independent as it is a standard way to record and represent what happened to help verify data, infer their quality, analyse the processes, and determine whether it can be trusted or not. The reason is mainly because Provenance is supposed to be integrated into any application.

Provenance can be expressed by describing how the three main components in PROV-DM (entities, activities, and agents) have influenced the data. Because they influence the existence of data, users need to make a judgement whether the data can be trusted or not. On the other words, Provenance provides information that can be used to form an assessment about the quality, reliability, or trustworthiness of the resource.

In summary, every Provenance-aware system should be able to record provenance on dataset transformations executed and to expose this provenance data in a consistent and logical format via a query interface. Those principles can be realized by implementing Provenance lifecycle consisting of four different phases: (i) creating, (ii) recording, (iii) querying and (iv) managing; and a Provenance-aware system should provide support for all these phases.

 McGuinness, D. L., Zeng, H., Da Silva, P. P., Ding, L., Narayanan, D., & Bhaowal, M. (2006). Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. MTW, 190.
 Szomszor, M., & Moreau, L. (2003). Recording and reasoning over data provenance in web and grid services. In On the move to meaningful Internet systems 2003: CoopIS, DOA, and ODBASE (pp. 603-620). Springer Berlin Heidelberg.
 Moreau, L., Groth, P., Miles, S., Vazquez-Salceda, J., Ibbotson, J., Jiang, S., ... & Varga, L. (2008). The provenance of electronic data. Communications of the ACM, 51(4), 52-58.
 Golbeck, J., & Hendler, J. (2008). A semantic web approach to the provenance challenge. Concurrency and Computation: Practice and Experience, 20(5), 431-439.
 Miles, S. (2006). Electronically querying for the provenance of entities. In Provenance and Annotation of Data (pp. 184-192). Springer Berlin Heidelberg.
 Ram, S., & Liu, J. (2007). Understanding the semantics of data provenance to support active conceptual modeling. In Active conceptual modeling of learning (pp. 17-29). Springer Berlin Heidelberg.
 Huynh, T. D., Ebden, M., Venanzi, M., Ramchurn, S. D., Roberts, S., & Moreau, L. (2013, March). Interpretation of crowdsourced activities using provenance network analysis. In First AAAI Conference on Human Computation and Crowdsourcing.
 Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., & Moreau, L. (2006). An architecture for provenance systems.

[archives limit=5]

Leave a Reply