Automated outbreak detection of communicable ... - staff.math.su.se

0 downloads 159 Views 677KB Size Report
diseases: From statistical method to application ... 3 Statistical Framework for Aberration Detection .... Follow the ma
Automated outbreak detection of communicable diseases: From statistical method to application Michael H¨ ohle1 joint work with

M.

Salmon2 ,

D.

Schumacher2 ,

1 Department 2 (currently

H. Burmann2 , C. Frank2 and H. Claus2

of Mathematics, Stockholm University, Sweden or formerly) Robert Koch Institute, Germany

Seminar, 24th May 2016 RIVM, Bilthoven, The Netherlands

Outbreak detection

M. H¨ ohle

1 / 31

Outline

1

Motivation for the Online Monitoring of Count Data

2

A System for Automated Outbreak Detection in Germany

3

Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond

4

Discussion

Outbreak detection

M. H¨ ohle

2 / 31

Motivation for the Online Monitoring of Count Data

Outline

1

Motivation for the Online Monitoring of Count Data

2

A System for Automated Outbreak Detection in Germany

3

Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond

4

Discussion

Outbreak detection

M. H¨ ohle

3 / 31

Motivation for the Online Monitoring of Count Data

Example: Monitoring German Salmonella Newport Cases

15 10 5



0

No. reported cases

20

German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):

2009 10

20

30

2010 40

50

10

20

30

2011 40

50

10

20

30

40

50

Year/Reporting Week

Outbreak detection

M. H¨ ohle

4 / 31

Motivation for the Online Monitoring of Count Data

Example: Monitoring German Salmonella Newport Cases

15 10 5



0

No. reported cases

20

German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):

2009 10

20

30

2010 40

50

10

20

30

2011 40

50

10

20

30

40

50

Year/Reporting Week

Outbreak detection

M. H¨ ohle

4 / 31

Motivation for the Online Monitoring of Count Data

Example: Monitoring German Salmonella Newport Cases

15 10 5



0

No. reported cases

20

German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):

2009 10

20

30

2010 40

50

10

20

30

2011 40

50

10

20

30

40

50

Year/Reporting Week

Outbreak detection

M. H¨ ohle

4 / 31

Motivation for the Online Monitoring of Count Data

Example: Monitoring German Salmonella Newport Cases

15 10 5



0

No. reported cases

20

German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):

2009 10

20

30

2010 40

50

10

20

30

2011 40

50

10

20

30

40

50

Year/Reporting Week

Outbreak detection

M. H¨ ohle

4 / 31

Motivation for the Online Monitoring of Count Data

Example: Monitoring German Salmonella Newport Cases

15 10 5



0

No. reported cases

20

German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):

2009 10

20

30

2010 40

50

10

20

30

2011 40

50

10

20

30

40

50

Year/Reporting Week

Outbreak detection

M. H¨ ohle

4 / 31

Motivation for the Online Monitoring of Count Data

Example: Monitoring German Salmonella Newport Cases

15 10 5



0

No. reported cases

20

German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):

2009 10

20

30

2010 40

50

10

20

30

2011 40

50

10

20

30

40

50

Year/Reporting Week

Outbreak detection

M. H¨ ohle

4 / 31

Motivation for the Online Monitoring of Count Data

Example: Monitoring German Salmonella Newport Cases

5

10

15

q0.95 of predictive distribution



0

No. reported cases

20

German Infection Protection Act (IfSG) data from the Robert Koch Institute (up to W40-2011):

2009 10

20

30

2010 40

50

10

20

30

2011 40

50

10

20

30

40

50

Year/Reporting Week

During Oct-Nov 2011 there was an outbreak associated with mung bean sprouts (RKI, 2012) Outbreak detection

M. H¨ ohle

4 / 31

Motivation for the Online Monitoring of Count Data

Challenges of surveillance data Issues making the statistical modelling and monitoring of surveillance time series a challenge: Lack of clear case definitions Under-reporting and reporting delays Often no denominator data Seasonality Low number of reported cases Presence of past outbreaks Existence of concurrent “explanatory” processes

Outbreak detection

M. H¨ ohle

5 / 31

Motivation for the Online Monitoring of Count Data

Methodological Feature Extraction

From a statistical viewpoint: The data are univariate or multivariate count time series There is a need to handle covariates such as time, seasonality, or other concurrent processes → regression framework We face a a sequential decision making problem → statistical process control There might be a need to adjust the analysis for occurred-but-not-yet-reported events

Outbreak detection

M. H¨ ohle

6 / 31

Motivation for the Online Monitoring of Count Data

Feature Extraction from a User’s Point of View

The amount of data collected exceeds the resources for case-based analysis → statistical summaries, visualization and automatization is needed Alarms are useless if there are too many, but missing an important outbreak is also fatal Automatic procedures provide a safety net, but I I I

Outbreak detection

The algorithms lack insights about the data generating processes Every pathogen has its special features (importance, consequences) Large accumulations in short time are noticed at the local level – no advantage of automatic monitoring here

M. H¨ ohle

7 / 31

Motivation for the Online Monitoring of Count Data

Summary

The Challenge of Outbreak Detection Systems Because of too many signals and a misalignment between users’ needs and signal presentation, output of automatic outbreak detection systems often has little impact on the practical work of public health institutions.

Our aims: System architecture and design should focus on improving impact → increase user focus Integrate new statistical developments in algorithmic biosurveillance

Outbreak detection

M. H¨ ohle

8 / 31

A System for Automated Outbreak Detection in Germany

Outline

1

Motivation for the Online Monitoring of Count Data

2

A System for Automated Outbreak Detection in Germany

3

Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond

4

Discussion

Outbreak detection

M. H¨ ohle

9 / 31

A System for Automated Outbreak Detection in Germany

Design Decisions (1) Follow checklist for computer supported outbreak detection by Hulth et al. (2010) – see also Triple-S guidelines in Hulth (2014) Follow the manifesto for agile software development1 : I I I I

Individuals and interactions Working software Customer collaboration Responding to change

over processes and tools

over comprehensive documentation

over contract negotiation

over following a plan

Start by developing a micro system for monitoring Salmonellosis in close collaboration with the responsible epidemiologist(s) Progressively scale up the system to other reporting categories by 1-on-1 discussions with the users 1

http://www.agilemanifesto.org/

Outbreak detection

M. H¨ ohle

10 / 31

A System for Automated Outbreak Detection in Germany

Design Decisions (2) – System Design To accommodate the need for weekly reports as well as ad-hoc analyses we created both a manual- and an automatic component.

Figure source: Salmon et al. (2016a) – available under a CC BY 4.0 license

Signals from the automatic component are stored in a signal database. Outbreak detection

M. H¨ ohle

11 / 31

Statistical Framework for Aberration Detection

Outline

1

Motivation for the Online Monitoring of Count Data

2

A System for Automated Outbreak Detection in Germany

3

Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond

4

Discussion

Outbreak detection

M. H¨ ohle

12 / 31

Statistical Framework for Aberration Detection

Statistical Framework for Aberration Detection (1)

Univariate time series {yt , t = 1, 2, . . .} to monitor For each time t we differentiate between two underlying states: in-control (everything is fine) & out-of-control (something is wrong). At time s ≥ 1, the available information is ys = {yt ; t ≤ s}. Based on ys an automatic detection procedure has to decide if there is unusual activity at time s (or not).

Outbreak detection

M. H¨ ohle

13 / 31

Statistical Framework for Aberration Detection

Statistical Framework for Aberration Detection (2)

The detectors are initially only based on the one-step-ahead predictive distribution at each time point (Shewhart-like control chart): I

I

I

Let G (ys |y1 , . . . , ys−1 ; θ) be the distribution of Ys in case everything is in-control. If the actual observed value Ys = ys is extreme in G , this is evidence against things being in-control. The alarm threshold a1−α,s at each time point is calculated as the (1 − α)’th quantile of the predictive distribution. If ys > a1−α,s then we have an alarm.

This can be generalized to more sequential control charts accumulating information, e.g. cumulative sum (CUSUM) methods.

Outbreak detection

M. H¨ ohle

14 / 31

Statistical Framework for Aberration Detection

Intermezzo: Estimation, prediction and uncertainty Data y are the observed value of a random variable Y characterized by a parametric model with density f (y ; θ). Aim: predict the value of a random variable Z , which, conditionally on Y = y has distribution function G (z|y ; θ), depending on θ. Simplest form of the prediction problem: iid

Y1 , . . . , Yn ∼ f (y ; θ), and the task is to predict Z = Yn+1 . In time series 1-step-ahead prediction the observations are correlated and the aim is to predict Z = Yn+1 .

Outbreak detection

M. H¨ ohle

15 / 31

Statistical Framework for Aberration Detection

Simple Algorithm for Ad-Hoc Detection

Example: Predicting a new N(µ, σ 2 ) observation (1) iid

Example: Let Y1 , . . . , Yn ∼ N(µ, σ 2 ) with unknown µ and σ 2 . Then Yn+1 − Y q ∼ t(n − 1), 1 s 1+ n where Y and s 2 are the sample mean and sample variance of Y . A (1 − 2α) · 100% two-sided prediction interval (PI) is thus given by r 1 Y ± t1−α (n − 1) · s · 1 + . n A plug-in (1 − 2α) · 100% two-sided prediction interval for µ is: Y ± z1−α · s. Both of these are not to be confused with a (1 − 2α) · 100% two-sided confidence interval for µ: s Y ± z1−α · √ . n Outbreak detection

M. H¨ ohle

16 / 31

Statistical Framework for Aberration Detection

Simple Algorithm for Ad-Hoc Detection

Example: Predicting a new N(µ, σ 2 ) observation (2) 0.25

Illustration: PIs based on n = 5 observations from N(µ, σ 2 ).

0.15 0.10 0.00

0.05

density

0.20

normal (plug−in) non−standard t

2

4

6

8

10

12

14

95% Pred. intervals

z

For n = 5 the 95% plug-in PI corresponds to a 85% PI. The 95% CI for µ is 7.0–9.8, which only corresponds to a 53% PI. Outbreak detection

M. H¨ ohle

17 / 31

Statistical Framework for Aberration Detection

Simple Algorithm for Ad-Hoc Detection

Summary: Ad-Hoc Outbreak Detection Algorithm

Predict value ys at time s = (s w , s y ) using a set of reference values from window of size 2w + 1 up to b years back. Let n = b(2w + 1) and compute threshold as the upper 97.5% quantile of the predictive distribution for ys , i.e. r 1 a0.975,s = y + t0.975 (n − 1) · s · 1 + . n Sound alarm, if ys > a0.975,s .

Outbreak detection

M. H¨ ohle

18 / 31

Statistical Framework for Aberration Detection

Farrington Algorithm and Beyond

Farrington algorithm (1) – basic model Predict value ys at time s = (s w , s y ) using a set of reference values from window of size 2w + 1 up to b years back.

300

● ● ● ● ●● ●● ●

● ● ●● ●● ●● ●



● ● ● ● ● ●●● ●

●● ●● ●●● ●



200

● ● ● ● ● ●● ● ●

0

100

No. infected

400

500

Prediction at time t=718 with b=5,w=4

450

500

550

600

650

700

Fit over-dispersed Poisson generalized linear model (GLM) to the b(2w + 1) reference values where E(yt ) = µt , Var(yt ) = φ · µt with log µt = α + βt and φ > 0. Outbreak detection

M. H¨ ohle

19 / 31

Statistical Framework for Aberration Detection

Farrington Algorithm and Beyond

Farrington algorithm (2) – outbreak detection

Predict and compare: An approximate (1 − α) one-sided prediction interval for ys based on p the GLM has upper limit a1−α,s = µ ˆs + z1−α · Var(ys − µ ˆs ) If the observed ys is greater than a1−α,s , then flag s as outbreak Refinements of the algorithm include: Computation of the prediction interval on a transformed scale Use a re-weighted fit with weights based on Anscombe residuals in order to correct for outliers

Outbreak detection

M. H¨ ohle

20 / 31

Statistical Framework for Aberration Detection

Farrington Algorithm and Beyond

Improvements of the Farrington algorithm Noufaily et al. (2013) suggest a number of improvements I

I I

use all past data as part of a 11-knot zero-order spline consisting of a 7-week reference period and nine 5-week periods only re-weight severely  observations (st = 2.58)  extreme µs and plug-in of µ ˆs and φˆ to find assume ys ∼ NegBin µs , φ−1

quantile of the predictive distribution

Salmon et al. (2016a) I

I

Outbreak detection

generalize the improvements to flexible zero-order splines for any type of periodicity integrate estimation uncertainty using a parametric bootstrap based on the asymptotic normality of the estimated GLM coefficients

M. H¨ ohle

21 / 31

Statistical Framework for Aberration Detection

Farrington Algorithm and Beyond

Application on Salmonella Montevideo 2009-2010 Results from the extended Farrington procedure using last five years as reference values:

Figure source: Salmon et al. (2016a) – available under a CC BY 4.0 license

Outbreak detection

M. H¨ ohle

22 / 31

Statistical Framework for Aberration Detection

Farrington Algorithm and Beyond

Salmonella Report for W41–46 of 2013 Weekly Report at National Level:

Table source: Salmon et al. (2016a) – available under a CC BY 4.0 license

Outbreak detection

M. H¨ ohle

23 / 31

Statistical Framework for Aberration Detection

Farrington Algorithm and Beyond

Excerpt of Salmonella report for W41–46 of 2013 Cluster Analysis:

Table source: Salmon et al. (2016a) – available under a CC BY 4.0 license

Outbreak detection

M. H¨ ohle

24 / 31

Discussion

Outline

1

Motivation for the Online Monitoring of Count Data

2

A System for Automated Outbreak Detection in Germany

3

Statistical Framework for Aberration Detection Simple Algorithm for Ad-Hoc Detection Farrington Algorithm and Beyond

4

Discussion

Outbreak detection

M. H¨ ohle

25 / 31

Discussion

Discussion The presented methods are implemented in the R package surveillance (Salmon et al., 2016a) Salmon et al. (2016b) documents the use of the above methods as backbone of the German IfSG routine surveillance system at the Robert Koch Institute (RKI) Developing, maintaining and improving automatic outbreak detection systems is an interdisciplinary activity! I I

Even more work could be put into user adaptation. Delay adjusted monitoring (Salmon et al., 2015)

The system proved to be a good insurance against missing anything important – see e.g. Gertler et al. (2015)

Outbreak detection

M. H¨ ohle

26 / 31

Discussion

Acknowledgments Persons: Ma¨elle Salmon, Dirk Schumacher and many other previous colleagues at the Robert Koch Institute, Berlin Sebastian Meyer, Michaela Paul and Leonhard Held – all currently or previously at the Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Switzerland Financial Support: German Science Foundation (DFG, 2003–2006) Munich Center of Health Sciences (2007–2010) Swiss National Science Foundation (SNF, 2007–2015) Robert Koch Institute (RKI, 2012–2015) Swedish Research Council (VR, 2016-2019)

Outbreak detection

M. H¨ ohle

27 / 31

Discussion

The End

That’s it. Thanks for the attention. q()

Michael H¨ohle, Stockholm University http://www.math.su.se/~hoehle m hoehle

Outbreak detection

M. H¨ ohle

28 / 31

Discussion

Literature I Gertler, Maximilian et al. (2015). “Outbreak of cryptosporidium hominis following river flooding in the city of Halle (Saale), Germany, August 2013”. In: BMC Infectious Diseases 15.1, p. 88. issn: 1471-2334. doi: 10.1186/s12879-015-0807-1. url: http://www.biomedcentral.com/1471-2334/15/88. Hulth, A. (2014). “First European guidelines on syndromic surveillance in human and animal health published”. In: Eurosurveillance 19.41. Available from http://www.eurosurveillance.org/ViewArticle. aspx?ArticleId=20927, pii=20927. Hulth, A., N. Andrews, S. Ethelberg, J. Dreesman, D. Faensen, W. van Pelt, and J. Schnitzler (2010). “Practical usage of computer-supported outbreak detection in five European countries”. In: Eurosurveillance 15.36.

Outbreak detection

M. H¨ ohle

29 / 31

Discussion

Literature II Noufaily, Angela, Doyo G. Enki, Paddy Farrington, Paul Garthwait, Nick Andrews, and Andr´e Charlett (2013). “An improved algorithm for outbreak detection in multiple surveillance systems”. In: Statistics in Medicine 32.7, pp. 1206–1222. RKI (2012). “Salmonella Newport-Ausbruch in Deutschland und den Niederlanden, 2011”. In: Epidemiologisches Bulletin 20. Available as http://www.rki.de/DE/Content/Infekt/EpidBull/Archiv/2012/ Ausgaben/20_12.pdf, pp. 177–184. Salmon, M., D. Schumacher, and M. H¨ ohle (2016a). “Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance”. In: Journal of Statistical Software 70.10. Also available as vignette of the R package surveillance. doi: 10.18637/jss.v070.i10.

Outbreak detection

M. H¨ ohle

30 / 31

Discussion

Literature III

Salmon, M., D. Schumacher, K. Stark, and M. H¨ ohle (2015). “Bayesian outbreak detection in the presence of reporting delays”. In: Biometrical Journal 57.6. http://dx.doi.org/10.1002/bimj.201400159, pp. 1051–1067. Salmon, M., Dirk Schumacher, Hendrik Burman, Christina Frank, Hermann Claus, and Michael H¨ ohle (2016b). “A system for automated outbreak detection of communicable diseases in Germany”. In: Eurosurveillance 21.13, pii=3018. doi: 10.2807/1560-7917.ES.2016.21.13.30180.

Outbreak detection

M. H¨ ohle

31 / 31