reproduce BRIGHT result
Hi, thanks for releasing Reason-ModernColBERT.
I’m trying to reproduce the Biology BRIGHT result reported in the model card (NDCG@10 = 33.25).
I evaluated Biology in two ways:
- ANN candidate retrieval + ColBERT reranking
- exact full ColBERT scoring against all Biology documents (no ANN pruning), with
pytrec_eval-style
metrics
In both cases I get about NDCG@10 ~= 0.295 instead of 0.3325.
Exact run:
- task:
biology - split:
examples NDCG@10:0.29534MRR@10:0.38032MAP@100:0.23691Recall@100:0.72108
My environment currently shows:
- checkpoint created with
sentence-transformers 4.0.2 - runtime has
sentence-transformers 3.4.1
So I wanted to ask:
- What exact evaluation script/repo was used for the published BRIGHT numbers?
Hello,
The reproduction setup and the exact script used can be found here.
I also recently reproduced the results and merged them into MTEB using the official MTEB implementation, see here for more information.
Some results are a bit different, but the large majority is expected considering the variance of the indexes because back then we did not fix the randomness, and also considering we changed index (StanfordNLP -> FastPlaid) AND that the dataset also changed versions. But the global values are consistent.
Hope it helps!