Tokenizer Choice For LLM Training: Negligible or Crucial? Paper • 2310.08754 • Published Oct 12, 2023 • 3
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models Paper • 2505.22232 • Published May 28 • 18
EU20-Benchmarks Collection Evaluation Benchmarks for 20 European languages. • 5 items • Updated Oct 11, 2024 • 9