About AbNovoBench
Advancing antibody research through standardized benchmarking and collaborative innovation
Our Mission
To establish a comprehensive, standardized, and reproducible benchmarking system dedicated to evaluating monoclonal antibody de novo sequencing.
Our Vision
To empower researchers with high-quality datasets, consistent evaluation tools, and pre-trained models—accelerating antibody therapeutic development without the need for retraining from scratch.
Our Impact
By offering the largest publicly available antibody sequencing dataset and ready-to-use evaluation pipeline, AbNovoBench helps researchers identify the most suitable models for their needs and drive innovation in antibody informatics.
About AbNovoBench
Empowering antibody research through unified training, large-scale curated datasets, and comprehensive, reproducible benchmarking.
Background
Monoclonal antibodies (mAbs) have become indispensable tools in modern research and biomedicine, yet accurate and complete sequence information identification remains challenging. Traditional mRNA-based methods rely on the availability of hybridoma clones and fail to detect post-translational modifications critical for antibody function. Mass spectrometry (MS)-based de novo sequencing provides a powerful alternative, enabling direct readout of secreted antibody sequences. However, benchmarking de novo tools in this domain has been limited by a lack of standardized datasets and unified evaluation protocols.
The Challenge
Most existing deep learning models for peptide sequencing were not specifically designed for monoclonal antibody applications. The field has faced three major obstacles: (1) inconsistent training datasets across studies that hinder fair comparisons; (2) insufficient availability of curated antibody-specific test data; and (3) the absence of a standardized, reproducible, and publicly accessible benchmarking framework tailored to antibodies.
Our Solution
AbNovoBench addresses these challenges by offering the largest high-quality benchmarking dataset for monoclonal antibody sequencing to date, comprising over 1.6 million peptide-spectrum matches (PSMs) from 131 antibodies across six species and 11 proteases. In addition, it provides eight fully annotated monoclonal antibody datasets with known full-length amino acid sequences, specifically designed for evaluating downstream assembly performance. The platform integrates a unified training set, standardized evaluation metrics, and automated scoring systems to support comprehensive, reproducible, and fair comparisons across different de novo peptide sequencing algorithms and assembly strategies.
National Institute for Data Science in Health and Medicine, Xiamen University
State Key Laboratory of Vaccines for Infectious Diseases, Xiamen University
Xiang An Biomedicine Laboratory, Xiamen University
School of Public Health, Xiamen University
School of Life Sciences, Xiamen University
School of Informatics, Xiamen University
The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University
Aginome Scientific, Xiamen
• Contribute datasets and models
• Report bugs and feature requests
• Improve documentation
• Share research findings