Asian Scientist Magazine (Sep. 16, 2022) — Cancer is one of the most prevalent noncommunicable diseases worldwide. Singapore alone reported 78,204 cases between 2015 and 2019, according to Singapore National Registry for Disease. That is nearly 43 patients diagnosed with a form of cancer per day through that period. With this, identifying cancer-causing mutations in a person’s genome is key to understanding the mechanism of disease formation and development of precision medicine to target specific cancer mutations in a patient’s sample.
However, sequencing large amounts of patient data – billions of nucleotides – to find mutations is time consuming and expensive. Therefore, the global scientific community has been trying to use AI to make the process efficient and accurate.
A research group from the Genome Institute of Singapore (GIS) have developed an AI-based mutation caller. Known as VarNet, the caller uses deep learning models to sift through raw DNA sequencing data and detect mutations. The group reported its findings in a recently published paper in Nature Communications.
VarNet is not the first AI-mutation caller. It’s unique because it is a ‘weakly supervised’ deep learning model, according to Anders Skanderup, Group Leader of the Laboratory of Computational Cancer Genomics at GIS and co-author of this paper.
“Deep learning models typically require vast amounts of labeled training data to perform robustly,” Skanderup told Asian Scientist Magazine. DNA sequencing data for cancer genomics is usually the opposite: the individual data samples themselves are not that large and not all mutations are fully labelled. “This poses a challenge in training a deep learning model for detecting cancer mutations as it requires significant human effort to create such a training dataset.” A ‘weakly supervised’ deep learning model is capable of handling large sums of imperfectly labeled data in its training data set and find cancer mutations.
Skanderup and his team used various other software to create high quality ‘pseudo-labels’ on sequencing data obtained from over 300 whole cancer genomes across seven cancer types, and subsequently fed it to VarNet. These ‘pseudo-labels’ gave VarNet the necessary information to detect various cancer mutations across 300 samples of raw DNA sequencing data from cancer patients.
Alongside the labeled tumor data, DNA sequencing data from healthy tissues were also fed in tandem. That was done to mimic the way humans would visually compare sequencing data from a cancerous tissue sample against sequencing data from a healthy tissue sample. From there, VarNet could detect mutations present in any sequencing data it came across.
After completing its training, VarNet’s performance in detecting mutations was compared against existing AI-based mutation callers using real and synthetically derived tumor data from various cancer genome databases. Overall results showed that VarNet outperformed the other mutation callers in accuracy of detecting mutations across most of the real and synthetic tumor data.
Identifying cancer mutations in extremely large sums of DNA sequencing data is a time consuming and expensive endeavor, and still requires the use of a human to validate and check the output of AI-based mutation callers. Skanderup hopes that VarNet’s success in accurately detecting mutations “could reduce the need for human experts in this [validation] process in the future.”
Source: Genome Institute of Singapore ; Image: Unsplash
The article can be found at: Krishnamachari et al. (2022), Accurate somatic variant detection using weakly supervised deep learning.