reproduce BRIGHT result

#3
by abdoelsayed - opened

Hi, thanks for releasing Reason-ModernColBERT.

I’m trying to reproduce the Biology BRIGHT result reported in the model card (NDCG@10 = 33.25).

I evaluated Biology in two ways:

  1. ANN candidate retrieval + ColBERT reranking
  2. exact full ColBERT scoring against all Biology documents (no ANN pruning), with pytrec_eval-style
    metrics

In both cases I get about NDCG@10 ~= 0.295 instead of 0.3325.

Exact run:

  • task: biology
  • split: examples
  • NDCG@10: 0.29534
  • MRR@10: 0.38032
  • MAP@100: 0.23691
  • Recall@100: 0.72108

My environment currently shows:

  • checkpoint created with sentence-transformers 4.0.2
  • runtime has sentence-transformers 3.4.1

So I wanted to ask:

  1. What exact evaluation script/repo was used for the published BRIGHT numbers?
LightOn AI org
edited 4 days ago

Hello,

The reproduction setup and the exact script used can be found here.

I also recently reproduced the results and merged them into MTEB using the official MTEB implementation, see here for more information.
Some results are a bit different, but the large majority is expected considering the variance of the indexes because back then we did not fix the randomness, and also considering we changed index (StanfordNLP -> FastPlaid) AND that the dataset also changed versions. But the global values are consistent.

Hope it helps!

Sign up or log in to comment