Bayesian Network model

The principal aim of this work is to compare two algorithms (K2 and BNPC) for learning Bayesian Network (BN) applied to the World Health Survey (WHS) data. The World Health Organization (WHO) conducted the WHS between 2002 and 2004 in 70 countries, and it consists of a real world dataset with a great deal of interdependence between the components. Moreover, differences between learning algorithms concerning the capability of the networks in predicting the most probable values were investigated.

The BN is a graphical probabilistic model based on the probabilistic theories, able to explain the dependency between variables and to predict the behaviour of some variables of interest. It consists of a qualitative part (structure) specifying the conditional in/dependencies between the variables and a quantitative part (tables) specifying the conditional probabilities of the data set variables.

Formally, BN is a network of nodes connected by directed links with a probability function assigned to each node. The structure of a BN is a Directed Acyclic Graph (DAG), that is, there is no directed path starting and ending at the same node.

Obtaining a BN from data is a complex learning process that is divided into two phases: find the structure of the network and estimate the conditional probability distributions on the parameters of BN, given the structure. In this study, two learning algorithms were applied: K2 and BNPC algorithm.

K2 algorithm, representative of the Search & Score (S&S) methodology, is a greedy heuristic algorithm which takes a data set and a node order as input and constructs the BN structure. It applies a Bayesian scoring function (in this case the K2) with the aim to find the most probable BN structure, given a data set, which maximizes the posterior probability distribution. Elvira tool, a software available free of charge at http://leo.ugr.es/elvira/, was used for the K2 algorithm implementation.

BNPC algorithm, representative if the Conditional Independence (CI) methodology, uses independence tests and mutual information. A structure edge is learned if meeting a constraint derived from comparing the value of a statistical based test of conditional independence to a threshold. The homonymous BNPC software was applied for the implementation of this algorithm, available free of charge at www.cs.ualberta.ca/~jcheng/bnsoft.htm, and developed by Jie Cheng (Department of Computing Science, University of Alberta), David Bell and Weiru Liu (Faculty of Informatics, University of Ulster).

In order to compare the obtained BNs four performance measures were calculated:

- The logarithmic value of the Bayesian Dirichlet equivalent with uniform (BDeu) metric, which measures the marginal likelihood using a uniform joint distribution.
- The logarithmic version of the Bayesian Information Criterion (BIC) metric, a penalized version of the likelihood introducing a penalty term for the number of parameters in the model.
- The logarithmic version of the K2 metric. It is a well-known evaluation measure (or scoring function) for learning Bayesian networks from data. Like BDeu metric, K2 metric uses a distribution that is locally, but not globally, uniform.
- The Kullback-Leibler (KL) divergence between two probability distributions P (the "true" distribution of data) and Q (the distribution associated to the learned network).

For all these four metrics, the higher the values, the better the BN.

Moreover, differences between learning algorithms concerning the capability of the networks in predicting the most probable values (percentage of classification success) were investigated.

The overall dataset (containing 221,670 respondents) was divided into four economic areas based on the World Bank categories: high income, upper middle income, lower middle income and low income.

Prior to analyzing data, each of the four datasets was divided into two subsets: the training set (75% of the data) used for the learning of the structure of the network and the test set (remaining 25%) to make inference (propagation of evidence) with the obtained network.

Based on the obtained metrics (in table), the K2 algorithm performs better than the BNPC in all categories, with regard to BDeu and BIC metrics; with regard to the other two metrics, K2 and KL, no significant differences between the two networks are showed. Similarly, in terms of the percentage of classification success, the identification of the "best" algorithm is not straightforward: it seems that the BNPC network works better than the K2 network for each economic area, except for the upper middle income category.

In conclusion, in the light of resulting values, no better performance of a specific BN algorithm was found.

Moreover, this study showed how the choice of the algorithm, the metrics used and the purpose underlying the construction of the network may lead to different networks.

- Adriana Brogini, University of Padua, Italy.