EASI Leaderboard

from textwrap import dedent

NUM_FEWSHOT = 0  # Change with your few shot
# ---------------------------------------------------


# Your leaderboard name
TITLE = """<h1 align="center" id="space-title">EASI Leaderboard</h1>"""

# What does your leaderboard evaluate?
INTRODUCTION_TEXT = dedent("""
**EASI: Holistic Evaluation of Multimodal LLMs on Spatial Intelligence**

EASI conceptualizes a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and a standardized protocol for the fair evaluation of state-of-the-art proprietary and open-source models.
""")

# Which evaluations are you running? how can people reproduce what you have?
LLM_BENCHMARKS_TEXT = dedent("""
## Leaderboard

You can find the documentation of EASI here: [EvolvingLMMs-Lab/EASI](https://github.com/EvolvingLMMs-Lab/EASI).

And the dataset for this leaderboard: [lmms-lab-si/EASI-Leaderboard-Data](https://huggingface.co/datasets/lmms-lab-si/EASI-Leaderboard-Data)
""")

EVALUATION_QUEUE_TEXT = ""
dedent("""
## Some good practices before submitting an evaluation with EASI

### 1) Make sure you can load your model and tokenizer using AutoClasses:
```python
from transformers import AutoConfig, AutoModel, AutoTokenizer
config = AutoConfig.from_pretrained("your model name", revision=revision)
model = AutoModel.from_pretrained("your model name", revision=revision)
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
```
If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.

Note: make sure your model is public!
Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!

### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!

### 3) Make sure your model has an open license!
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗

### 4) Fill up your model card
When we add extra information about models to the leaderboard, it will be automatically taken from the model card

## In case of model failure
If your model is displayed in the `FAILED` category, its execution stopped.
Make sure you have followed the above steps first.
If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task).
""")

CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = dedent("""
@article{easi2025,
  title={Has gpt-5 achieved spatial intelligence? an empirical study},
  author={Cai, Zhongang and Wang, Yubo and Sun, Qingping and Wang, Ruisi and Gu, Chenyang and Yin, Wanqi and Lin, Zhiqian and Yang, Zhitao and Wei, Chen and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Li, Jiaqi and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei},
  journal={arXiv preprint arXiv:2508.13142},
  year={2025}
}
""").strip()

# --------------------------------------

SUBMISSION_INSTRUCTIONS_TEXT = dedent("""
## Submission Instructions

First, **Login** to your HuggingFace account, so that we can identify you and your evaluation results.

Then, you can see the submission form, and fill in the following information:

1. Fill in the **model name to search** for on the HuggingFace Hub. (e.g. `qwen/qwen3-vl-8b-instruct`   )
2. Select the **model** from search results, and check the model name autofilled below (e.g. `Qwen/Qwen3-VL-8B-Instruct`).
3. (Optional) Fill in the **revision commit** of the model. If not filled, means using the latest `main` branch.
4. Select the **model type**. (e.g. `pretrained`)
5. Select the **precision** of the model. (e.g. `bfloat16`)
6. Select the **weights type** of the model. (defaults to `Original`)
7. (Optional) Fill in the **base model name** for **delta** or **adapter** weights. (e.g. `Qwen/Qwen3-VL-8B-Instruct`)
8. Check the checkbox of the **benchmark**s to evaluate on, and fill in the **evaluation result value** (e.g. `0.5` for `VSI-bench` `acc`).
9. (Optional) Fill in the **commit message** to add extra message about the evaluation.
10. Click the **[Submit Eval]** button to submit the evaluation request.

Then your evaluation will be submitted, and our team will review it and add it to the leaderboard.
""")