Risk-based provenance graph

Provenance records can be used to describe a lineage of something such as a laptop, newspaper, shoes, bags, etc. Those objects can be seen as a business product, which has been through some set of processes in order to reach its final stage. The capability of provenance records to describe the lineage of the product allows us to model the product’s derivation from one stage to another ultimately produce a product supply chain that can describe the lineage of the product.

As a product is changed from one stage to the next stages through its processes, some risks are introduced as part of those processes. These risks are modelled by the experts to calculate or measure their impact. These models take their associated risk factors as the parameters for their calculations. Thus, capturing these models and their risk factors along the product supply chain will make risk more visible from the beginning to the end of the product’s lifetime.

The integration of provenance and risk can use PROV to produce the provenance records and can then be visualised as a Provenance Graph (PG), which represents the product supply chain. This provenance-based supply chain holds the information about the risk to be calculated; and hence allows the propagation of risk over the PG. In general, PROV has 3 core concepts (i.e., prov:Entity, prov:Activity, and prov:Agent), and they map a product, its set of processes and the operators; and the relations between them describe the derivation of the product through their processes.

With the additional domain-specific ontology, the details of related risk models and risk factors can be annotated in the prov:Activity when constructing a provenance-based supply chain. Later during the MC simulation, the input from the used prov:Entity (edge prov:used) can be processed with the risk factors and risk model in the prov:Activity to generate the output in generated prov:Entity (edge prov:wasGeneratedBy). This approach allows us to break the processes in a product supply chain into individual independent modules and map each module with its associated risk within the PG, as shown Figure below.

Figure 1: The general principle for overlaying risk in a provenance-based supply chain.

Figure 1 depicts the integration that describes how a process of a product (blue rectangle p0) uses the inputs of products (yellow oval {in0,…,inx}) to produce (or transform into) their outputs (yellow oval {out0,…,outx}). In detail, it shows that p0, as an instance of prov:Activity, has dependencies with input prov:Entity {in0,…,inx} through the prov:used (indicated by use edge) and output prov:Entity {out0,…,outx} through prov:wasGeneratedBy (indicated by gen edge). With this dependency, this approach is able to describe which outputs use which inputs by using link prov:wasDerivedFrom (indicated by der edge).

Each prov:Activity (i.e., p0), contains associated risk factors as the parameters for a function in a risk model to calculate the output values. These risk factors are captured as the attributes in rfx_p0, where x is an index to accommodate multiple risk factors. The values of the risk factors, vrfx_p0, are often in the form of a probability distribution. We intentionally capture the risk factors by annotating them in a PG for two reasons. First, to be able to explain and reason the phenomena of the risk before and after a process; second, to construct a risk model that calculates the output values to mimic or model how the actual process produces its output.

Besides the risk factors, the risk models (i.e., rmx_p0) also need to be captured in order to calculate the numeric output values (i.e., num_out0,…,num_outx). This mathematical formula represents a risk model that takes into account all the risk factors and all numeric input values (i.e., num_in0,…,num_inx). The mathematical formulae of the risk models are meant to generate the distribution of numeric output values through MC simulation. A risk model is defined as a mathematical formula that mimics the actual process (i.e., mod_0). While some processes are easy to model, others are complicated and may require multiple risk models if they are to be modelled fully. In the event of such multiple risk models, a prov:Activity will have multiple rmx_p0, where x indicates an index for each risk model. Note that the numeric output values of one process may become the input values for the next process.

Similar to rfx_p0, the numeric input values, num_in0 to num_inx, are often captured as a probability distribution in order to represent a range of possible input values in0 to inx could have. The probability distribution will be passed in to the Monte Carlo simulation to produce the output values , num_out0 to num_outx, in out0 to outx.

In the simulation, f_freq([in0,…,inx] [out0,…,outx], f_jpd(freq_p0), and f_cpd(jpd_p0) are called as functions to construct the Frequency Table, Joint Probability Table (JPT), and Conditional Probability Table (CPT) with the scope of input and output of the prov:Activity. The function f_freq([in0,…,inx][out0,…,outx][rfx_p0]) is meant to quantify the changing of the states between the inputs ([in0,…,inx]) and the outputs ([out0,…,outx]) after the simulation. In other words, it quantifies the changes of f_jpd(freq_p0) and f_cpd(jpd_p0) before becoming the subsequent function to construct the joint and conditional distributions (and present them as the JPT and CPT) based on the result of quantified changing state by function freq_p0.

Figure 2: Example of the expected provenance graph with its populated values based on Figure 1.

Eventually, this integration generates a PG with the notion of risk annotated within it as depicted in Figure 2. Figure 2 also represents an example of the expected PG. This allows risk to be propagated within the PG of a product, and this integration can help us understand what processes the product has gone through and thereby to assess the quality and safety of the product in the form of a basic graph for the MC simulation before its conversion into another graphical representation for risk propagation by means of the Belief Propagation technique.

Leave a Reply