Design of EventEpi, an event-based health surveillance platform.

- Posted by

What is event-based surveillance?
The goal of event-based surveillance is to detect unusual events that might signal an outbreak. Event-based public health surveillance looks at reports, stories, rumors, and other information about health events that could be a serious risk to public health.

enter image description here

Main categories of event-based surveillance There are basically two main categories of event-based surveillance.

• Such information is unstructured. Information obtained through event-based surveillance can come from sources like reports in the media or rumors on an internet blog.

• In contrast Indicator-based public health surveillance is a more traditional way of reporting diseases to public health officials. Indicator-based surveillance involves reports of specific diseases from health care providers to public health officials. Such information may be described as structured information because the information obtained is standardized.

Examples of event-based health surveillance: WHO’s global surveillance system picks up public health threats 24 hours a day, 365 days a year. Once an event is verified, WHO assesses the level of risk and sounds the alarm. Within 48 hours of an emergency, WHO grades the severity of the event, activates the incident management system and deploys field teams.

Design of EventEpi at The Information Centre for International Health Protection. Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. The Information Centre for International Health Protection (Informationsstelle für Internationalen Gesundheitsschutz, INIG) at RKI, performs event-based surveillance to identify events relevant to public health in Germany. Their routine tasks include reading online articles from a defined set of sources, evaluating them for relevance, and then manually filling a spreadsheet with information from the relevant articles. This spreadsheet is called Ereignisdatenbank (IDB). enter image description here

To support event-based surveillance, but also to gain insights into what makes an article and the event it describes relevant, the authors of “EventEpi–A Natural Language Processing Framework for Event-Based Surveillancedeveloped a natural-language-processing framework for automated information extraction and relevance scoring.

Their approach consists of two complementary parts: key information extraction and relevance scoring. Both approaches are integrated in a web application called EventEpi. With the exception of the convolutional neural network for which they used Keras, they used the Python package scikit-learn to implement the machine learning algorithms.

The IDB has to be preprocessed before any application of NLP. was not designed to be used with machine learning algorithms. It thus contained some inconsistencies that might not disturb human users but had to be resolved before machine processing. For example a case count could contain numerals as strings instead of numerical digits. Other entries have inconsistent naming schemes. In addition entries in the IDB were written in German but the output of EpiTator has to be in English.

The authors performed named entity recognition in two steps:

  • EpiTator, an open-source epidemiological annotation tool, scraped relevant sources and suggested many different candidates for the following entities: disease, country, date, and confirmed-case count. To accomplish the key information extraction, two problems needed to be solved:
    • First, the output of EpiTator needed to be comparable to the entries in the IDB.
    • Second and more importantly, the output of EpiTator needed to be filtered. A naive approach to finding the key entity out of all the entities returned by EpiTator is to pick the most frequent one. This approach worked well for detecting the key country and disease, but not for the key date and confirmed-case count. For those, the authors developed a learning-based approach.
  • The second part of developing a framework to support EBS was to estimate the relevance of epidemiological articles. The scientists framed the relevance evaluation as a classification problem. They trained a naive Bayes classifier to find the most likely entities in that set. For relevance scoring, the authors defined two classes to which any article might belong:
    • The article is relevant if it is in the event-based surveillance database.
    • Irrelevant otherwise.

Two sources stood out as being relevant, and easy to scrape:

  • World Health Organization Disease Outbreak News (WHO DON)
  • ProMED Mail.

The authors compared the performance of different classifiers, using document and word embeddings. State-of-the-art text classifiers tend to use word embeddings for vectorization rather than the tf-idf and bag-of-words approach. Word embeddings are vector representations of words that are learned on large amounts of texts in an unsupervised-manner. Proximity in the word embedding space tends to correspond to semantic similarity. The researchers compared six different classifiers for the relevance scoring task. Two of the tested algorithms stood out:

  • The multilayer perceptron performed best overall.
  • The support-vector machine, on the other hand, had the highest recall (0.88) which can be of higher interest for epidemiologists.

Finally, the authors integrated these functionalities into a web application called EventEpi where relevant sources are automatically analyzed and put into a database. The same fundamental issues encountered in using machine learning in general apply here as well, in particular bias and explainability.

Tackling individual biases and personal preferences during labeling by experts is essential. It will also be important to show why EventEpi extracted certain information or computed a relevance, for it to be adopted but also critically assessed by epidemiologists for improvement.

At the moment EventEpi only presents results to the user. However it could be expanded to be a general interface to an event database and allow epidemiologists to note which articles were indeed relevant as well as correct key information, an approach called active-learning

The overall framework, can be used in production, promising improvements in event-based surveillance. The source code is publicly available at


This book retraces the main achievements of ALS research over the last 30 years, presents the drugs under clinical trial, as well as ongoing research on future treatments likely to be able stop the disease in a few years and to provide a complete cure in a decade or two.

Please, help us continue to provide valuable information: