KAUST Research Team Develops Privacy-Preserving Machine Learning for Genomic Data Analysis

KAUST Research Team Develops Privacy-Preserving Machine Learning for Genomic Data Analysis

In a significant advancement for medical research, a team of researchers from the King Abdullah University of Science and Technology (KAUST) has introduced a pioneering machine-learning approach aimed at accelerating discovery from genomic data while safeguarding individuals' privacy. The study, published in the journal Science Advances, addresses the critical challenge of leveraging artificial intelligence (AI) in genomic research without compromising privacy.

"Omics data, which includes gene expression and cell composition, often contains sensitive information related to an individual's health or disease status," explains Xin Gao from KAUST. "Deep learning models trained on this data have the potential to inadvertently retain private details. Our goal is to strike a balance between preserving privacy and optimizing model performance."

The traditional method of encrypting data for privacy preservation introduces computational overheads and restricts model usage to secure environments. Alternatively, breaking data into smaller packets for local training, known as federated learning, can still pose privacy risks.

To overcome these challenges, the team integrated an ensemble of privacy-preserving algorithms, including differential privacy and decentralized shuffling, into their machine-learning approach dubbed PPML-Omics.

Juexiao Zhou, lead author of the paper and a Ph.D. student in Gao's group, highlights the innovative use of a decentralized shuffling algorithm to ensure privacy protection without compromising model efficiency. This approach achieves a superior balance between privacy preservation and model performance compared to previous methods.

The team demonstrated the effectiveness of PPML-Omics by training three deep-learning models on challenging multi-omics tasks. Not only did the approach produce optimized models efficiently, but it also exhibited robustness against cyberattacks.

"As deep learning becomes increasingly prevalent in biological and biomedical data analysis, the importance of privacy protection cannot be overstated," emphasizes Gao. "Our research underscores the critical need for privacy-preserving techniques to safeguard sensitive information while unlocking the potential of AI in genomic research."