PROBE Leaderboard: Protein Representation Model Evaluation
Welcome to the PROBE (Protein RepresentatiOn BEnchmark) leaderboard! This platform evaluates protein representation models based on their ability to capture functional properties of proteins through four key benchmarks:
- Semantic Similarity Inference
- Ontology-based Function Prediction
- Drug Target Protein Family Classification
- Protein Protein Binding Affinity Estimation
Submit your own representation models and compare their performance across these benchmarks. For more details on how to participate, see the submission guidelines at Submit Here! tab. For descriptions of each benchmark and its metrics, please refer to the About tab. To visualize results, visit the Visualization tab.
If you find PROBE Leaderboard useful, please consider citing our work:
Çevrim, E., Yiğit, M. G., Ulusoy, E., Yılmaz, A., & Doğan, T. (2025). A Benchmarking Platform for Assessing Protein Language Models on Function-related Prediction Tasks. In Protein Function Prediction: Methods and Protocols (pp. 241-268). New York, NY: Springer US.
Unsal, S., Atas, H., Albayrak, M., Turhan, K., Acar, A. C., & Doğan, T. (2022). Learning functional properties of proteins with language models. Nature Machine Intelligence, 4(3), 227-245.
For detailed explanations of the metrics and benchmarks, please refer to the 📝 About tab.
Method-name colours
🟢 Classical representations
🔵 Small-scale Protein LMs
🔴 Large-scale Protein LMs
🟠 Multimodal Protein LMs
Metric-cell shading
1
2
3
4
5
top-five scores (darker → better)
SaProt-35m-af2 | -0.0121 | 0.0912 | 0.2141 | 0.0978 | null | null | null | null | null | null | null | null | 0.3881 | 0.4097 | 0.4867 | 0.5989 | 0.1611 | 0.1872 | 0.3225 | 0.1976 | 0.2394 | 0.2322 | 0.2829 | 0.2819 | 0.2629 | 0.2764 | 0.5166 | 0.3075 | 0.6178 | 0.6013 | 0.4607 | 0.5459 | 0.5508 | 0.2398 | 0.5578 | 0.5505 | 0.4354 | 0.5868 | 0.5675 | 0.4167 | 18.478 | 105.931 | 0.4605 |