
AsianScientist (Sep. 5, 2017) – No longer the preserve of academics, high performance computing has broken out of the ivory tower and become mainstream, transforming entire industries in its wake. Even healthcare, a traditionally cautious industry with legitimate concerns about the privacy of patient data, has not been left untouched.
At the Beth Israel Deaconess Medical Center in Boston, Massachusetts, for example, a supercomputer crunches 30 years’ worth of data from over 250,000 patients to predict a patient’s likelihood of dying over the next 30 days. In Asia, IBM’s Watson supercomputer is helping doctors diagnose cancer across hospitals in Thailand, India and China. (Check out our interview with IBM’s Farhana Nakhooda)
“Just as the effective use of data has transformed industries like banking and logistics, we should be able to do the same thing in healthcare to improve efficiency and the patient experience,” said Professor Tai E Shyong of the Yong Loo Lin School of Medicine, National University of Singapore.
“These improvements do not need supercomputing per se, but there will be specific applications where we use large volumes of data to get the right services to the right patient at the right time.” he said.
Putting the pieces together
One area where data is expected to make a profound difference is in the emerging field of precision medicine, where genomics and other ‘omics data are used to personalize treatments for different patients. Expecting it to revolutionize healthcare and the way it is delivered, the US government in 2016 set aside US$215 million for their Precision Medicine Initiative.
“Take the example of prostate cancer,” said Dr. Saumya Jamuar, a consultant at KK Women’s and Children’s Hospital and clinical lead of the Singhealth Duke-NUS Institute of Precision Medicine. “If six individuals with prostate cancer walk into the clinic, only one of them is going to die from the disease. In the other five cases, the cancer is going to outlive the patient.”
“These improvements do not need supercomputing per se, but there will be specific applications where we use large volumes of data to get the right services to the right patient at the right time.” he said.
“If you treat all of them the same way, even though the majority have no added benefit from the treatment, they are all at risk of developing therapy related complications. But how do you know which individual will be the one who responds well to treatment? This is one of the biggest frustrations we have as clinicians—how to choose the right therapy for the right patient?”
Thankfully, help is at hand. Scientists have identified genetic markers that have a specific pattern of expression in patients that require aggressive treatment, allowing doctors to identify and only treat those who are likely to respond to treatment, sparing the rest unnecessary surgery and expense.
Diagnosing individual patients, and identifying relevant gene markers in the first place, requires serious gene sequencing capabilities. Each individual human genome is about three billion base pairs long. To sequence a person’s DNA, the entire genome is broken up into hundreds of millions of fragments of about 150 base pairs each. Computers are then used to stitch everything back together, a task akin to solving a three-billion-piece puzzle where multiple sets of the puzzle have been shaken together just to make matters more difficult.
“Supercomputers are essential for analyzing all this data. We need computational resources that are powerful enough to do the probabilistic calculations needed to do the alignments,” continued Jamuar, who is also co-founder and co-chief scientific officer of Global Gene Corp, a genomics platform company focused on precision medicine.
“When we did an initial analysis of 300 samples, it generated about three terabytes of data. By parallelizing the analysis on the facilities provided by Singapore’s National Supercomputing Centre, we were able to complete the analysis within a week. It would have taken us much more time if we had used a traditional cloud-based server.”
Restoring the balance
Supercomputers are set to play an even more important role as Global Gene Corp continues to scale up, Jamuar said. Founded in 2013 with the aim of diversifying the genetic sequences available worldwide, Global Gene Corp hopes to add tens of thousands of Asian genomes to the global pool, starting with South Asians.
“Worldwide, there is data from about 200,000 individuals that is publicly available; most of that data is from white Europeans. Asians, who make up 60 percent of the world’s population, represented less than five percent of that pool when we first started,” Jamuar shared with Supercomputing Asia. “Since then the numbers have improved slightly, but there’s still a huge bias.”
“Even the first version of the Exome Aggregation Consortium (ExAC), which is the largest consortium for genomic data with a dataset of over 60,000 individuals, had only 6,387 South Asians and 2,016 East Asians. The majority of the South Asians came from Pakistan, and the majority of East Asians were from Taiwan. What that has done is skewed our understanding of what a reference genome looks like.” he added.
The company, which has already recruited more than 10,000 Indians to the study, has plans to do much more to rectify the imbalance.
“Precision medicine may be changing the way that healthcare is delivered in the West, but if we don’t have the data for Asian populations, we are effectively excluding 60 percent of the world population. We want to change that, and this is what drives me, personally.”