add text to about tab
Browse files- src/about.py +8 -2
src/about.py
CHANGED
|
@@ -20,11 +20,17 @@ INTRODUCTION_TEXT = """
|
|
| 20 |
"""
|
| 21 |
|
| 22 |
LLM_BENCHMARKS_TEXT = f"""
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
"""
|
| 25 |
|
| 26 |
EVALUATION_QUEUE_TEXT = """
|
| 27 |
-
|
| 28 |
"""
|
| 29 |
|
| 30 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
|
|
|
| 20 |
"""
|
| 21 |
|
| 22 |
LLM_BENCHMARKS_TEXT = f"""
|
| 23 |
+
This leaderboard presents hallucination benchmarks for multimodal LLMs on tasks of different input modalities, including image-captioning and video-captioning. For each task, we measure hallucination levels of the text output of various multimodal LLMs using existing hallucination metrics.
|
| 24 |
+
|
| 25 |
+
Some metrics such as POPE*, CHAIR, UniHD are designed specifically for image-to-text tasks, and thus are not directly applicable to video-to-text tasks. For the image-to-text benchmark, we also provide the ranking based human rating where annotators were asked to rate the outputs of the multimodal LLMs on MHaluBench. *Note that the POPE paper proposed both a dataset and a method.
|
| 26 |
+
|
| 27 |
+
More information about each existing metric can be found in their relevant paper, and CrossCheckGPT is proposed in https://arxiv.org/pdf/2405.13684.
|
| 28 |
+
|
| 29 |
+
Currently, the leaderboard hasn't yet supported automatic evaluation of new models, but you are welcome to request an evaluation of a new model by creating a new discussion, or emailing us at [email protected].
|
| 30 |
"""
|
| 31 |
|
| 32 |
EVALUATION_QUEUE_TEXT = """
|
| 33 |
+
Currently, the leaderboard hasn't yet supported automatic evaluation of new models, but you are welcome to request an evaluation of a new model by creating a new discussion, or emailing us at [email protected].
|
| 34 |
"""
|
| 35 |
|
| 36 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|