PROBE Leaderboard: Protein Representation Model Evaluation

Welcome to the PROBE (Protein RepresentatiOn BEnchmark) leaderboard! This platform evaluates protein representation models based on their ability to capture functional properties of proteins through four key benchmarks:

  • Semantic Similarity Inference
  • Ontology-based Function Prediction
  • Drug Target Protein Family Classification
  • Protein Protein Binding Affinity Estimation

Submit your own representation models and compare their performance across these benchmarks. For more details on how to participate, see the submission guidelines at Submit Here! tab. For descriptions of each benchmark and its metrics, please refer to the About tab. To visualize results, visit the Visualization tab.

If you find PROBE Leaderboard useful, please consider citing our work:

Çevrim, E., Yiğit, M. G., Ulusoy, E., Yılmaz, A., & Doğan, T. (2025). A Benchmarking Platform for Assessing Protein Language Models on Function-related Prediction Tasks. In Protein Function Prediction: Methods and Protocols (pp. 241-268). New York, NY: Springer US.

Unsal, S., Atas, H., Albayrak, M., Turhan, K., Acar, A. C., & Doğan, T. (2022). Learning functional properties of proteins with language models. Nature Machine Intelligence, 4(3), 227-245.

For detailed explanations of the metrics and benchmarks, please refer to the 📝 About tab.

Base Methods
User Methods
Benchmark Types
Select Metrics

Method-name colours

🟢  Classical representations
🔵  Small-scale Protein LMs
🔴  Large-scale Protein LMs
🟠  Multimodal Protein LMs

Metric-cell shading

1 2 3 4 5
top-five scores (darker → better)

Method
sim_sparse_MF_correlation
sim_sparse_CC_correlation
sim_sparse_BP_correlation
sim_sparse_Ave_correlation
sim_200_MF_correlation
sim_200_CC_correlation
sim_200_BP_correlation
sim_200_Ave_correlation
sim_500_MF_correlation
sim_500_CC_correlation
sim_500_BP_correlation
sim_500_Ave_correlation
func_MF_accuracy
func_MF_F1
func_MF_precision
func_MF_recall
func_BP_accuracy
func_BP_F1
func_BP_precision
func_BP_recall
func_CC_accuracy
func_CC_F1
func_CC_precision
func_CC_recall
func_Ave_accuracy
func_Ave_F1
func_Ave_precision
func_Ave_recall
fam_nc_f1_ave
fam_nc_accuracy_ave
fam_nc_mcc_ave
fam_uc30_f1_ave
fam_uc30_accuracy_ave
fam_uc30_mcc_ave
fam_uc50_f1_ave
fam_uc50_accuracy_ave
fam_uc50_mcc_ave
fam_mm15_f1_ave
fam_mm15_accuracy_ave
fam_mm15_mcc_ave
aff_mse_ave
aff_mae_ave
aff_corr_ave

SaProt-35m-af2

-0.0121
0.0912
0.2141
0.0978
0.3881
0.4097
0.4867
0.5989
0.1611
0.1872
0.3225
0.1976
0.2394
0.2322
0.2829
0.2819
0.2629
0.2764
0.5166
0.3075
0.6178
0.6013
0.4607
0.5459
0.5508
0.2398
0.5578
0.5505
0.4354
0.5868
0.5675
0.4167
18.478
105.931
0.4605

If a method name ends with ^, it suggests potential suspicions of data leakage related to similarity, function, or family benchmarks.