Big Data Versus The Next Big Outbreak

We now have vast amounts of information at our fingertips; could we use it to tackle some of the most challenging and complex public health issues the world is facing today?

AsianScientist (Oct. 26, 2015) – Haiti, 2010. In the dust-caked village of Mirebalais, a locally-known and somewhat eccentric man took his daily walk—naked—across the yellowing grass. He slid into the Latem River, scooped up a handful of tepid water and drank. And that was it: the moment the outbreak began.

The 28-year-old quickly developed severe diarrhea and died within 24 hours. Over the coming months a further 500,000 people were infected, leaving 7,000 dead. But this was 2010—the last major cholera pandemic happened over half a century ago. Where did the Haiti cholera outbreak originate from? And what could have been done to curb its spread?

With billions of passengers making tens of millions of international flights every year, it’s never been easier for pathogens to hitch a ride. But what if that abundance be used to our advantage? In an era of big threats—Ebola, SARS, MERS, bird flu—could ‘big data’ hold the answer? It’s a revolution underpinned by ever-more powerful computing and unprecedented volumes of digital data. And it promises to transform the way we approach any crisis, from natural disasters to disease outbreaks.

What is big data?

Big data is the ability to capture and process a large volume of data. It’s nothing new—governments, businesses and academics have been crunching numbers from libraries, telecommunications and census records for decades, with the so-called ‘information explosion’ beginning as early as 1941.

Yet the rate at which data is being produced today is undoubtedly skyrocketing. From the dawn of human civilization to 2003, the entire planet produced approximately five billion gigabytes of data. By 2020, this figure is expected to hit 44 trillion gigabytes per year. That’s 440 times the number of stars in the Milky Way, or equivalent to 8.8 trillion copies of the complete works of Shakespeare.

There’s an app for that

One red-hot field is the use of mobile phone data. It works like this: every time someone makes a call or sends a text message, mobile phone companies typically record the time of a call, the phone numbers of the caller and receiver and which cell phone tower was used to relay the call, information which indicates the rough location of the user at the time.

“The data gives information on how populations are moving generally. If there happened to be an outbreak in a small village or town you could immediately look at how populations are moving between that town and other areas,” Andrew Tatem, director of the WorldPop population mapping project and co-director of Flowminder Foundation, explained to Asian Scientist Magazine.

By analyzing anonymous data from nearly three million mobile phones, Tatem, a professor at the University of Southampton in the UK, helped to predict the spread of the 2010 cholera outbreak in Haiti.

“[Big data] is vital in situations of disease outbreaks because diseases are spread by people. We’ve shown that this kind of data can outperform some of the best models that exist at the moment,” he said.

But what about diseases carried by insects? Malaysia’s warm, tropical climate makes it a breeding ground for the Aedes aegypti mosquito, a large white-spotted insect which carries the deadly dengue virus. In 2014, with triple the number of dengue-related deaths compared to the previous year and 250 new cases being reported daily, the government-sponsored Multimedia Development Corporation decided to take action. They launched the Big Data App challenge in 2014: firms were provided with weather and case data for dengue fever, and asked to develop a useful app.

US data company Teradata signed on. It teamed up with Malaysia’s Multimedia University and Dr. Dhesi Baha Raja from the Malaysia Ministry of Health, to produce a dengue index which would alert the public to the potential for an outbreak in their area within the next month with
over 80 percent accuracy. It is hoped that the app, which won the challenge, will help health authorities and the public to prepare for future flu outbreaks.

Flu forecasting

Yet not all big data is made equal. One of the most prolific sources of data is the Internet, and with 67,000 search queries per second, Chinese tech giant Baidu is at the forefront of the data revolution.

Today, typing ‘flu’ into Baidu’s western rival, Google, not only returns information on the virus but contributes to Google Flu Trends. The service racks the firm’s vast backlog of search data to estimate regional activity from ‘minimal’ to ‘intense.’ By comparing their results with data from traditional surveillance methods, Google has found that searches for ‘flu’ and 40 related terms tend to spike when there’s an outbreak.

Fresh from their success using big data to predict Germany’s triumph in the 2014 World Cup, Baidu is following in Google’s footsteps and working with the Chinese Center for Disease Control and Prevention to launch its own flu prediction service. There’s just one problem: since its launch in 2008, Google Flu Trends overestimated peak flu levels in the US for three years in a row. Will Baidu learn from Google’s mistakes?

Forecasting flu in Asia faces additional challenges. Unlike in the US where flu outbreaks typically only occur in winter, they can happen at any time of the year in tropical Asian climates. Analysis of flu patterns is further complicated by the deep international interconnectedness and high population density of many Asian cities, as typified by the bustling financial center of Hong Kong.

And yet even there big data has proven useful. Using data from 1998-2013, researchers from the University of Hong Kong developed a model that could predict both the peak timing and peak magnitude of seasonal and pandemic influenza outbreaks, with an accuracy as high as 93 percent.

So will big data put a stop to the next big outbreak?

“Big data is not a magic bullet, but it is an extra valuable tool,” said Tatem.

With billions of gigabytes of data at our fingertips, we may just need to ask the right questions.

This article was first published in the print version of Asian Scientist Magazine, October 2015.


Photo: Shutterstock.

To read more, subscribe to Asian Scientist Magazine in print and receive four issues of Asian Scientist Magazine delivered directly to your mailing address for 12 months, inclusive of taxes and postage.

Zaria Gorvett is a freelance science writer based in the UK. She graduated with a bachelors degree in biological science from the University of Exeter, UK and a masters degree in medical microbiology from the London School of Hygiene and Tropical Medicine, UK.

Related Stories from Asian Scientist

  • Discovering Drugs Through Big Data Discovering Drugs Through Big Data Wuhan University researchers develop FingerDTA: an algorithm to predict drug-target binding affinities.
  • Storing Big Data In A Tiny Space Storing Big Data In A Tiny Space By combining graphene oxide with upconversion nanoparticles, researchers have found a way to store large amounts of data on an optical disk.
  • Data Is The New Oil And Electricity Data Is The New Oil And Electricity As artificial intelligence becomes increasingly pervasive in our daily lives, policies and laws need to be examined for the protection of and access to the data driving it, experts said.
  • Students Take The Stage Students Take The Stage Given time, training and resources, students can achieve amazing feats, as demonstrated by the teams taking part in the 2019 APAC HPC-AI Competition.
  • Taking Measure Of The World We Live In Taking Measure Of The World We Live In As big data changes the way we conduct business, how we communicate that large volume of information to the public will also change, said experts at the SMU Communications Management Colloquium.
  • Tech Doing Good Tech Doing Good From cleaning up mountains of untreated sludge to making railway stations fully solar-powered, technology is an ally to those hoping to solve Asia’s environmental problems.