Provenance Documents

Provenance should be openly available and can be accessed to support transparency. Therefore, it need to be standardized, so provenance data can be exchanged and interlinked among applications. To achieve that, W3C defines a standard language for exchanging provenance called PROV. PROV is intended to facilitate the machine-processable data model for provenance. By providing a standardized vocabulary to describe flow of data, process, and responsibility, an application can be enriched with data-lineage about the origin of data. The abstract model and serialization of PROV is depicted below.

PROV Framework

Among those 11 documents, the most important ones to start the provenance are PROV-DM, PROV-CONSTRAINTS, PROV-N, PROV-O, and PROV-AQ. An extra document that worth mentioned is PROV-TEMP which dealing with template to generate provenance document based on binding as an input. Those documents will be discussed below.

PROV-DM (PROV Data Model) is defined as a generic data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model and interchanged between systems [1]. PROV-DM is represented using PROV-N and PROV-O and has 6 components as follows:
• entities and activities, and the time at which they were created, used, or ended (core),
• derivations of entities from entities (core),
• agents bearing responsibility for entities that were generated and activities that happened (core),
• a notion of bundle, a mechanism to support provenance of provenance (extended),
• properties to link entities that refer to the same thing (extended),
• collections forming a logical structure for its members (extended).

On the other hand, PROV can be seen as a framework for Provenance. According to W3C,  PROV can be defined as a specification to express provenance records, which contain descriptions of the entities and activities involved in producing and delivering or otherwise influencing a given object. Figure below depicts the core concept of PROV.


PROV-CONS (PROV Constraints) is meant to validating provenance document so it follows the logical order when ones do reasoning and analysis [8]. As mentioned above, PROV-DM is a standard interchange format for provenance. However, it does not have power to express a certain rule. For instances, one can express that an activity is using an entity which has not been created. Another example is the use of entity while it has been terminated. Therefore, certain rules are needed to express logic in provenance. This is the main reason of Constraint of Data Model or widely known as PROV-Constraint. To achieve its purpose, PROV-CONS defined four kinds of constraints: uniqueness constraints, event ordering constraints, impossibility constraints, and type constraints. Further information about PROV-CONS can be seen in

In order to mapping PROV-DM, PROV-N (PROV Notation) is introduced. PROV-N is a syntax designed to explicitly express PROV-DM for human consumption and becomes the basis formal semantics for PROV [3]. Three principles of PROV-N are technology independent (should be covered by several technology), human readability (easily read and interpret by human), and formality (use formal grammar).

PROV-O (PROV Ontology) is a lightweight document that provides a set of classes, properties, and restriction written in OWL2 Web Ontology Language (OWL2) to be used when generate provenance. It allows an application to generate, share, integrate provenance under different contexts.

Once provenance has been created, it should be stored somewhere to support the idea of interchange-ability. To be discovered by consumers, provenance should be locate, retrieve, and query. This is the main reason of the Provenance AQ which allows consumers to access provenance record. Provenance query service will use 3 general procedures to access provenance of resource. First, it retrieves the service description and locates information about query mechanism within service description. Secondly, it extracts information needed to use founded mechanism. Finally, based on that information, querying required provenance using selected query mechanism.

In many cases, a provenance file can be used as a cornerstone to create another provenance. This document is known as Provenance template or PROV-TEMP (PROV Template). To create provenance template, some consideration need to be followed. They are as follows:
• Provenance template must contain a single bundle,
• Provenance template may contain variables in the form of qualified names var:x and vargen:x in any position where such qualified names are allowed in PROV,
• Provenance template may contain attributes in the prov-template namespace (prefix tmpl).

By knowing its origin and how the information was derived, we can assess the quality of information. This steps helps a system or application toward system accountability. This can benefit food industry as provenance can be used to capture the traceability of food product. Finally, another important aspect
of provenance is how it should be integrated with various information source. In term of food traceability, a good provenance should capture information in such a way that can support interoperability and communication among many parties involved in food supply chain.

 [1] L. Moreau and P. Missier, “PROV-DM: The PROV Data Model” World Wide Web Consortium, W3C Recommendation REC-prov-dm-20130430, Apr. 2013.
 [2] T. D. Nies, “Constraints of the PROV Data Model” World Wide Web Consortium, W3C Recommendation REC-prov-constraints-20130430, Apr. 2013.
 [3] L. Moreau and P. Missier, “PROV-N: The Provenance Notation” World Wide Web Consortium, W3C Recommendation REC-prov-n-20130430, Apr. 2013.

[archives limit=5]

Leave a Reply