Bulletin of the World Health Organization

A Bayesian network approach to the study of historical epidemiological databases: modelling meningitis outbreaks in the Niger

A Beresniak a, E Bertherat b, W Perea b, G Soga c, R Souley d, D Dupont e & S Hugonnet b

a. Data Mining International, Route de l’Aeroport 29–31, CP 221, 1215 Geneva 15, Switzerland.
b. Epidemic and Pandemic Alert and Response, World Health Organization, Geneva, Switzerland.
c. World Health Organization, Niamey, Niger.
d. Department of Statistics, Surveillance and Response to Epidemics, Ministry of Health, Niamey, Niger.
e. Data Mining America, Montreal, Canada.

Correspondence to A Beresniak (e-mail: aberesniak@datamining-international.com).

(Submitted: 10 January 2011 – Revised version received: 05 December 2011 – Accepted: 05 December 2011 – Published online: 20 January 2012.)

Bulletin of the World Health Organization 2012;90:412-417A. doi: 10.2471/BLT.11.086009

Introduction

Throughout the African “meningitis belt”, epidemics of meningococcal disease have been reported since the disease was first described, in 1912.1 Every year, western African countries within the Sahelo–Sudanian band experience major outbreaks of meningococcal meningitis, each of which can affect up to 200 000 people, most of them young children.2 Burkina Faso, Mali and the Niger, for example, are regularly hit by meningitis epidemics.37 Meningitis is characterized by high levels of seasonal endemicity, with large epidemics of meningococcal meningitis occurring cyclically.8 Each such epidemic typically starts in January and ends in late May.2 The three main pathogens causing bacterial meningitis in Africa are Neisseria meningitidis, Haemophilus influenzae type b and Streptococcus pneumoniae.4,9,10

As a major cause of morbidity and mortality in sub-Saharan Africa, meningitis merits specific control measures.1113 Burkina Faso, Mali and the Niger have each already established a broad surveillance network for collecting data on cases of the disease.4,6,11,12 Since 1986, peripheral health centres in each of these countries have routinely collected data on suspected cases of meningitis and reported each such case to health districts for subsequent analysis at the national health ministry. The information collected in this surveillance now forms important epidemiological databases that could improve our understanding of the nature of meningitis epidemics and allow high-risk areas to be identified. Current strategies for controlling meningitis epidemics in sub-Saharan Africa are based on epidemiological, immunological and logistical considerations. The strategy currently advocated by the World Health Organization (WHO) consists of the early detection of epidemics, the treatment of cases with antibiotics, and mass vaccination to halt the outbreak (if possible, within 4–6 weeks of the epidemic threshold being reached).8,14,15 The early detection of meningitis epidemics is based mainly on the observation of the trends in weekly incidence.8,12,14,1618

The effectiveness of mass vaccination as a strategy for the control of meningitis epidemics has been questioned.19 Given the sporadic nature of the outbreaks, the optimal use of vaccines to control both short-term epidemic and endemic meningococcal disease has been the subject of much debate.20 In particular, the results of several studies in Africa have shown that vaccination during outbreak situations is suboptimal, mainly because populations in resource-poor areas cannot be immunized rapidly enough.20 In addition, although rapid laboratory diagnosis is an essential component in the surveillance of meningococcal epidemics, as it allows decision-makers to select the most appropriate vaccine for mass vaccination,2123 the resource-poor countries most affected by such epidemics struggle to achieve such diagnosis.21 A new conjugate vaccine that protects against serogroup A meningitis (the form involved in most outbreaks of meningococcal disease in Africa) was approved by the United States Food and Drug Administration in February 2010. Conjugate vaccines prevent both carriage and the transmission of bacteria from person to person.

Extensive descriptive analyses that have been performed on meningitis surveillance databases have provided important information on epidemic cycles, seasonality and the correlations between morbidity and certain co-factors, such as climatic parameters.2,2426 While the seasonal and spatial patterns of the disease appear to be linked to climate, the mechanisms responsible for these patterns have still to be elucidated.2,25

In the present study, surveillance data and a modelling method based on Bayesian networks were used to explore how meningitis incidence in a district of the Niger was influenced by, or influenced, the incidence in any other district. The aim was to develop a method for the optimization of epidemic alerts and the spatial and temporal targeting of immunizations and other interventions for the management of meningitis in the Niger, with the ultimate goal of preventing meningitis epidemics in the country.

Methods

A Bayesian network consists of a graphical model showing the probabilistic relationships between one or more variables of interest. Bayesian methods are particularly valuable whenever there is a need to extract information from data that are uncertain or subject to any kind of error or noise.27,28 When applied to the forecasting of epidemics, a Bayesian network can allow potential “dependence relationships”, such as the probability that an outbreak will occur in one district after it has occurred in certain other districts, to be explored.

A Bayesian network is represented by a graph composed of nodes connected by arrows. Each arrow begins at a “parent” node and ends at a “child” node, with the “parent” directly influencing the connected “child” in some way. The degree of influence between each “parent” and “child” is a conditional probability that can usually be computed from the data. Each node of a Bayesian network has an associated conditional probability table. In an epidemiological analysis, the nodes might represent the districts of a country and the arrows and conditional probability table might show how a disease outbreak in one district is linked to the probability of a disease outbreak in another district. In the example shown (Fig. 1), if district A experienced an outbreak, the probabilities that districts B and C experienced outbreaks would be 85% and 15%, respectively.

Fig. 1. Simple Bayesian network composed of nodes, arrows and conditional probabilities for the occurrence of a meningitis epidemic in each of a country’s five hypothetical districts
Fig. 1. Simple Bayesian network composed of nodes, arrows and conditional probabilities for the occurrence of a meningitis epidemic in each of a country’s five hypothetical districts

Since January 1986, WHO has supervised the weekly collection of data on meningitis incidence in each district of the Niger and these data now form a substantial historical database. The data recorded include district name and population, week number and the reported numbers of suspected cases of meningitis and of deaths attributed to meningitis in that district and week. For the present study, the data from the database for the 14 calendar years from January 1986 to December 1999 were used to construct a Bayesian network, with a node for each of the 38 districts in the Niger. The weekly incidence thresholds set by WHO for epidemic meningitis (i.e. at least 10 cases per 100 000 inhabitants in a district with a population of at least 30 000 or at least five cases per 100 000 in a district with a smaller population) were used to give each district a weekly score of 1 (if an epidemic had occurred) or 0 (if no outbreak had occurred). The corresponding weekly incidence thresholds for “epidemic alert”, which could be used for the early detection of a meningitis epidemic, are lower (at least five cases per 100 000 inhabitants and at least two cases per 100 000, respectively).

Two types of analysis were then performed, one (the “first-level” analysis) taking no account of the time taken for an epidemic in one district to influence the development of an epidemic elsewhere (ignoring the timing of the epidemics) and the other (the “second-level” analysis) assuming that an epidemic in one district would influence the development of an epidemic elsewhere 1, 2 or 3 years later. A maximum lag of 3 years was explored because this was both the median period in the cycles of meningitis outbreaks detected in the historical database and the estimated duration of protection resulting from a vaccination programme.

The results of the “first-level” analysis, performed with version 1.0 of the Discoverer software package (Bayesware, Milton Keynes, United Kingdom of Great Britain and Northern Ireland), were used to create a graph of the Bayesian network and included the relevant conditional probability table, showing the probabilities of meningitis outbreaks. Because of the limited capability of the Discoverer software, the analysis was restricted to the first 14 years of the meningitis database for the Niger. The “minimum description length” learning algorithm was used, with network parameters estimated via the maximum-likelihood technique. The “second-level” analysis was performed using the data collected on meningitis in the Niger between January 1986 and December 2005, a β version of the Bayesia package (Bayesia SAS, Laval, France) and the Bayes Net and BNLAT toolboxes from the Matlab package (Mathworks Inc., Natick, United States of America).

Results

Between January 1986 and December 2005, 182 244 suspected cases of meningitis and 14 859 meningitis-related deaths were recorded in the Niger. The cumulative attack rate over the 20 annual cycles of meningitis that occurred over this period was 1919.6 (suspected) cases per 100 000 population. The corresponding mean annual attack rate was 96.0 cases per 100 000 population. The Niger experienced six major national-level epidemics of meningitis between January 1986 and December 2005, with, at times, more than 50 cases recorded per 100 000 population in a single week.

First-level analysis

The results of the first-level analysis, which ignored any time lag between epidemics, are summarized in Fig. 2, with arrows running from the influencing districts to the influenced districts. From this graph, the number of districts influenced by a given district and the number of districts influencing a specific district (Table 1), can be determined at a glance.

Fig. 2. Bayesian network graph showing how each of the Niger’s 38 districts is influenced by, and/or influences, the probability of a meningitis outbreak in at least one other district
Fig. 2. Bayesian network graph showing how each of the Niger’s 38 districts is influenced by, and/or influences, the probability of a meningitis outbreak in at least one other district

Conditional probabilities were calculated for all districts. As an example, Table 2 presents the conditional probabilities of an outbreak and of no outbreak in the Boboye district, according to the outbreak status of the two influencing districts: Agadez and Birni. These results indicate that, when epidemics occur in both Agadez and Birni, Boboye should be included in any vaccination strategy because the probability of an outbreak in this district is high (96.2%).

Second-level analysis

Fig. 3 illustrates the results of the second-level analysis. The large number of links between the nodes in the uppermost plot in this figure, which shows the links when no lag (“Year 0 to Year 0”) or a lag of 1, 2 or 3 years is used, make this plot difficult to interpret. The links become clearer when they are plotted separately for lags of 1, 2 and 3 years, as in the other plots in Fig. 3. Unfortunately, this analysis did not allow the accurate calculation of conditional probabilities, mainly because observations with “no outbreak” status were too few.

Fig. 3. Diagrams illustrating how a meningitis outbreak in a given district of the Niger influenced a similar outbreak in one or more other districts
Fig. 3. Diagrams illustrating how a meningitis outbreak in a given district of the Niger influenced a similar outbreak in one or more other districts
Note: The diagrams are based on weekly incidence data collected between 1986 and 2005. The numbers 1 to 38 indicate the districts in the Niger. The top plot illustrates the links with time lags of 0, 1, 2 or 3 years while the other plots show, separately, the links with no time lag (“Year 0 to Year 0”) and lags of 1 year (“Year 0 to Year 1”) and 3 years (“Year 0 to Year 3”).

Fig. 4 summarizes the results of the analysis of longitudinal surveillance data for those districts in which meningitis incidence reached the epidemic threshold in 2003 and 2004. Arrows again indicate the influencing relationships.

Fig. 4. Longitudinal surveillance data listing the districts of the Niger where meningitis reached an epidemic threshold in 2003 and/or 2004
Fig. 4. Longitudinal surveillance data listing the districts of the Niger where meningitis reached an epidemic threshold in 2003 and/or 2004

Discussion

The control of epidemic meningitis remains an unresolved problem in Africa, partly because the location of major outbreaks, which might be affected by relatively rare events, is so difficult to predict. The data analysed in the present study were collected over such a long period (up to 20 years) that they were probably affected by unpredicted rare events, such as unusual migratory patterns, armed conflicts and catastrophic dry seasons. In general, valid predictive statistics on such rare events cannot be derived from mathematical models, but data mining methods, when applied to clinical and epidemiological data, may allow previously unpredictable or unknown trends to be made apparent.29 In the Niger, such an approach may well help to elucidate the factors that influence or trigger meningitis outbreaks. Unfortunately, historical epidemiological databases like the one used in the present study are very rare in the context of epidemic diseases, especially in developing countries. While descriptive analyses constitute a standard approach, they do not allow optimal responses to future major public health issues to be planned.

In the present study, although the first-level analysis allowed the conditional probabilities between the influencing and influenced districts to be estimated, it made no allowance for the time needed for the inter-district influences to reveal themselves as changes in meningitis incidence. Unfortunately, when lags of 0, 1, 2, or 3 years were considered in the second-level analysis, too few “no-outbreak” observations were available for an accurate estimation of conditional probabilities. Nevertheless, both methodological approaches appear promising in the planning and timely deployment of vaccination programmes, especially given the limited duration of the protection offered by current vaccines.

A validation test of the Bayesian network model was recently carried out using a 6-year time horizon, with observed surveillance data from 2006, 2007 and 2008 compared with expected model predictions from 2004, 2005 and 2006 (data not shown). Districts reaching epidemic alert or epidemic thresholds in 2004, 2005 and 2006 were considered as potentially influencing other districts in 2006, 2007 and 2008. The proportion of the outbreaks observed in 2006–2008 that were successfully predicted using the data collected in 2004–2006 was 63% when the epidemic threshold was used for the influencing districts and 91% when the alert threshold was used. This promising performance justifies the continued interest in such modelling for guiding immunization planning, including the extension of vaccination campaigns to “influenced” districts when an “influencing” district has reached the epidemic threshold.

More sophisticated approaches should be explored. For example, it would be interesting to analyse the collected data year by year and to build a Bayesian network for each year. Re-sampling simulation techniques could be used to simulate and replicate observations, therefore allowing the construction of a Bayesian network for each simulation, together with a conditional probability table.

In conclusion, a Bayesian network approach offers an innovative and promising technique for extracting meaningful and clinically useful information from a large historical epidemiological database. As suggested by the present study, this method can improve our understanding of the dynamics of epidemic outbreaks, help make health interventions more effective and optimize resource allocations. Unfortunately, the method relies on large volumes of longitudinal records that are rarely available in the context of epidemic diseases. In this respect, the historical database on meningitis in the Niger provided a rare data set that made the testing of this promising approach possible. Unfortunately, no operational conditional probabilities regarding meningitis outbreaks could be determined when a time lag was included. However, further research allowing Bayesian networks to be combined with re-sampling techniques, would allow this issue to be addressed, to the potential benefit of those countries where relevant large epidemiological databases are unavailable.


Acknowledgements

We thank Michel Lamure, from Claude Bernard Lyon 1 University (France), for his important contributions.

Competing interests:

None declared.

References

Share