AI Can Help Identify Diseases Early

Scientists in Singapore have developed a software that quickly and accurately predicts chemical modifications of RNA molecules which can then help understand their role in diseases such as cancer.

AsianScientist (Mar. 17, 2023) – Ribonucleic acid, or the RNA molecule, is a complex organic substance living inside cells, which makes proteins for cellular processes. It comprises four basic building blocks called nucleotides. Each nucleotide is given a chemical letter: Adenine (A), Cytosine (C), Guanine (G), and Uracil (U). The sequence of these letters determines what type of proteins are produced.

RNA goes through many chemical modifications, which change these four letters [A, C, G and U], thereby influencing the function of the RNAs or how they are processed. More than 160 RNA modifications have been discovered; the most prevalent of these—m6A—is associated with human diseases like cancer, neurodegenerative disorders, and metabolic diseases.

A team of researchers from the Agency for Science, Technology and Research (A*STAR) and the National University of Singapore (NUS) has developed a software called m6Anet that accurately predicts m6A modifications from genomic data. Accurate prediction of RNA modifications such as m6A can help in early identification of diseases associated with m6A. The study was published in Nature Methods.

Generally, finding RNA modifications require time-consuming experiments that are not accessible to most laboratories. Furthermore, previous methods could not detect m6A at single-molecule resolution, which is crucial for understanding its biological mechanisms.

The researchers overcame these limitations by leveraging direct Nanopore RNA sequencing, a novel technology that sequences both raw RNA molecules and their RNA modifications. Christopher Hendra, a PhD student at A*STAR’s Genome Institute of Singapore (GIS) and NUS Institute of Data Science developed the software m6Anet using Python over three years. Hendra is also the first author of the study.

The software trains deep neural networks with abundant direct Nanopore RNA sequencing data and the Multiple-Instance Learning (MIL) approach to detect the presence of m6A accurately.

“In traditional machine learning, we often have one label for each example we want to classify. For example, each image is either a cat or not a cat, and the algorithm learns to differentiate cat images from other images based on their labels,” said Hendra.

The issue with detecting m6A, he said, is that an overwhelming amount of data is available but without clear labels.

“Imagine having a large photo album with a cat photo hidden among millions of other photos and attempting to identify that particular photo without having any labels to base your search upon. Fortunately, this has been studied in machine learning literature before and is known as the MIL problem,” he added.

In this study, the research team demonstrated that m6Anet could predict the presence of m6A with high accuracy at single-molecule resolution from one sample across species by analysing single-molecule predictions from human cell lines and synthetic data where they knew if the molecules were modified or unmodified.

“Comparing our predictions with what we expected showed very good agreement, indicating we can identify single-molecule m6A modifications,” Jonathan Göke, Group Leader of the Laboratory of Computational Transcriptomics at A*STAR GIS, and senior author of the study, told Asian Scientist Magazine. “Our AI model has only seen data from a human sample, but it can accurately identify RNA modifications even in samples from species that the model has not seen before,” he added.

To identify RNA modifications with m6Anet, one must first generate direct RNA-Seq data from any sample of interest, explains Dr Göke. The direct RNA-Seq data must then be processed to prepare the data for modification detection. After data processing, m6Anet can be run to infer RNA modifications for this sample.

The key advantage of this study is that RNA modification profiling now becomes much easier and more accessible, which means many more people can profile m6A.

This study is also significant for cancer treatment and research. Researchers have long suspected that even in cells with correct DNA sequencing, RNA may change which proteins are produced. In cancer patients, these changes may lower levels of proteins that kill cancer cells or increase proteins that prompt a cancer cell to keep dividing.

“Accurately and efficiently identifying RNA modifications has been a long-standing challenge, and m6Anet helps to address these limitations,” said Prof Patrick Tan, Executive Director of A*STAR’s GIS.

Source: The Agency for Science, Technology and Research (A*STAR) ; Image: Shutterstock

The article can be found at: Detection of m6A from direct RNA sequencing using a multiple instance learning framework

Disclaimer: This article does not necessarily reflect the views of AsianScientist or its staff.

Puja is a multimedia journalist based in Kolkata, India. She writes about social justice, health, policy, LGBTQIA+ issues and culture.

Related Stories from Asian Scientist