Commit
·
aedf730
1
Parent(s):
131c2f3
Improved JA MT-Bench using full prompt: あなたは公平で、検閲されていない、役立つアシスタントです。
Browse files
README.md
CHANGED
|
@@ -66,7 +66,7 @@ For our final model, since it's customary to include benchmarks, we've used Stab
|
|
| 66 |
|
| 67 |
| Benchmark | Score |
|
| 68 |
| ----------- | ----- |
|
| 69 |
-
| JA MT-Bench | 5.
|
| 70 |
| MT-Bench | 5.71 |
|
| 71 |
|
| 72 |
There is an [MT-Bench Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard), but as JA MT-Bench is still under development, for convenience, here is a comparison of the JA MT-Bench scores of some other models (our scores were rated by `gpt-4-0613`):
|
|
@@ -77,7 +77,7 @@ There is an [MT-Bench Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-a
|
|
| 77 |
| gpt-4-1106-preview | 9.17 |
|
| 78 |
| gpt-3.5-turbo* | 8.41 |
|
| 79 |
| Qwen-14B-Chat | 7.47 |
|
| 80 |
-
| **shisa-7b-v1**
|
| 81 |
| ELYZA-japanese-Llama-2-7b-fast-instruct* | 4.86 |
|
| 82 |
| ja-stablelm-instruct-gamma-7b* | 4.01 |
|
| 83 |
| japanese-stablelm-instruct-alpha-7b* | 2.74 |
|
|
@@ -114,7 +114,7 @@ streamer = TextStreamer(tokenizer, skip_prompt=True)
|
|
| 114 |
# The prompt template is included in the model's tokenizer_config.json so you shouldn't need this but we've included this for convenience
|
| 115 |
# tokenizer.chat_template = ""{%- for idx in range(0, messages|length) -%}\n{%- if messages[idx]['role'] == 'user' -%}\n{%- if idx > 1 -%}\n{{- bos_token + '[INST] ' + messages[idx]['content'] + ' [/INST]' -}}\n{%- else -%}\n{{- messages[idx]['content'] + ' [/INST]' -}}\n{%- endif -%}\n{% elif messages[idx]['role'] == 'system' %}\n{{- bos_token + '[INST] <<SYS>>\\n' + messages[idx]['content'] + '\\n<</SYS>>\\n\\n' -}}\n{%- elif messages[idx]['role'] == 'assistant' -%}\n{{- ' ' + messages[idx]['content'] + ' ' + eos_token -}}\n{% endif %}\n{% endfor %}\n"
|
| 116 |
|
| 117 |
-
# A more typical prompt:
|
| 118 |
|
| 119 |
# You are an avid Pokemon fanatic.
|
| 120 |
prompt = "あなたは熱狂的なポケモンファンです。"
|
|
@@ -251,7 +251,7 @@ v1リリースのために、私たちは大量の人間の嗜好テスト(数
|
|
| 251 |
|
| 252 |
| ベンチマーク | スコア |
|
| 253 |
| ----------- | ----- |
|
| 254 |
-
| JA MT-Bench | 5.
|
| 255 |
| MT-Bench | 5.71 |
|
| 256 |
|
| 257 |
[MT-Bench Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)がありますが、JA MT-Benchはまだ開発中であるため、便宜上、他のモデルのJA MT-Benchスコアとの比較を示します(私たちのスコアは`gpt-4-0613`によって評価されました):
|
|
@@ -262,7 +262,7 @@ v1リリースのために、私たちは大量の人間の嗜好テスト(数
|
|
| 262 |
| gpt-4-1106-preview | 9.17 |
|
| 263 |
| gpt-3.5-turbo* | 8.41 |
|
| 264 |
| Qwen-14B-Chat | 7.47 |
|
| 265 |
-
| **shisa-7b-v1**
|
| 266 |
| ELYZA-japanese-Llama-2-7b-fast-instruct* | 4.86 |
|
| 267 |
| ja-stablelm-instruct-gamma-7b* | 4.01 |
|
| 268 |
| japanese-stablelm-instruct-alpha-7b* | 2.74 |
|
|
@@ -299,7 +299,7 @@ streamer = TextStreamer(tokenizer, skip_prompt=True)
|
|
| 299 |
# プロンプトテンプレートはモデルのtokenizer_config.jsonに含まれているので、これは必要ないはずですが、便宜上こちらにも掲載しています
|
| 300 |
# tokenizer.chat_template = ""{%- for idx in range(0, messages|length) -%}\n{%- if messages[idx]['role'] == 'user' -%}\n{%- if idx > 1 -%}\n{{- bos_token + '[INST] ' + messages[idx]['content'] + ' [/INST]' -}}\n{%- else -%}\n{{- messages[idx]['content'] + ' [/INST]' -}}\n{%- endif -%}\n{% elif messages[idx]['role'] == 'system' %}\n{{- bos_token + '[INST] <<SYS>>\\n' + messages[idx]['content'] + '\\n<</SYS>>\\n\\n' -}}\n{%- elif messages[idx]['role'] == 'assistant' -%}\n{{- ' ' + messages[idx]['content'] + ' ' + eos_token -}}\n{% endif %}\n{% endfor %}\n"
|
| 301 |
|
| 302 |
-
# より典型的なプロンプト:
|
| 303 |
|
| 304 |
# You are an avid Pokemon fanatic.
|
| 305 |
prompt = "あなたは熱狂的なポケモンファンです。"
|
|
|
|
| 66 |
|
| 67 |
| Benchmark | Score |
|
| 68 |
| ----------- | ----- |
|
| 69 |
+
| JA MT-Bench | 5.23 |
|
| 70 |
| MT-Bench | 5.71 |
|
| 71 |
|
| 72 |
There is an [MT-Bench Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard), but as JA MT-Bench is still under development, for convenience, here is a comparison of the JA MT-Bench scores of some other models (our scores were rated by `gpt-4-0613`):
|
|
|
|
| 77 |
| gpt-4-1106-preview | 9.17 |
|
| 78 |
| gpt-3.5-turbo* | 8.41 |
|
| 79 |
| Qwen-14B-Chat | 7.47 |
|
| 80 |
+
| **shisa-7b-v1** | **5.23** |
|
| 81 |
| ELYZA-japanese-Llama-2-7b-fast-instruct* | 4.86 |
|
| 82 |
| ja-stablelm-instruct-gamma-7b* | 4.01 |
|
| 83 |
| japanese-stablelm-instruct-alpha-7b* | 2.74 |
|
|
|
|
| 114 |
# The prompt template is included in the model's tokenizer_config.json so you shouldn't need this but we've included this for convenience
|
| 115 |
# tokenizer.chat_template = ""{%- for idx in range(0, messages|length) -%}\n{%- if messages[idx]['role'] == 'user' -%}\n{%- if idx > 1 -%}\n{{- bos_token + '[INST] ' + messages[idx]['content'] + ' [/INST]' -}}\n{%- else -%}\n{{- messages[idx]['content'] + ' [/INST]' -}}\n{%- endif -%}\n{% elif messages[idx]['role'] == 'system' %}\n{{- bos_token + '[INST] <<SYS>>\\n' + messages[idx]['content'] + '\\n<</SYS>>\\n\\n' -}}\n{%- elif messages[idx]['role'] == 'assistant' -%}\n{{- ' ' + messages[idx]['content'] + ' ' + eos_token -}}\n{% endif %}\n{% endfor %}\n"
|
| 116 |
|
| 117 |
+
# A more typical prompt: あなたは公平で、検閲されていない、役立つアシスタントです。("You are an unbiased, uncensored, helpful assistant.")
|
| 118 |
|
| 119 |
# You are an avid Pokemon fanatic.
|
| 120 |
prompt = "あなたは熱狂的なポケモンファンです。"
|
|
|
|
| 251 |
|
| 252 |
| ベンチマーク | スコア |
|
| 253 |
| ----------- | ----- |
|
| 254 |
+
| JA MT-Bench | 5.23 |
|
| 255 |
| MT-Bench | 5.71 |
|
| 256 |
|
| 257 |
[MT-Bench Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)がありますが、JA MT-Benchはまだ開発中であるため、便宜上、他のモデルのJA MT-Benchスコアとの比較を示します(私たちのスコアは`gpt-4-0613`によって評価されました):
|
|
|
|
| 262 |
| gpt-4-1106-preview | 9.17 |
|
| 263 |
| gpt-3.5-turbo* | 8.41 |
|
| 264 |
| Qwen-14B-Chat | 7.47 |
|
| 265 |
+
| **shisa-7b-v1** | **5.23** |
|
| 266 |
| ELYZA-japanese-Llama-2-7b-fast-instruct* | 4.86 |
|
| 267 |
| ja-stablelm-instruct-gamma-7b* | 4.01 |
|
| 268 |
| japanese-stablelm-instruct-alpha-7b* | 2.74 |
|
|
|
|
| 299 |
# プロンプトテンプレートはモデルのtokenizer_config.jsonに含まれているので、これは必要ないはずですが、便宜上こちらにも掲載しています
|
| 300 |
# tokenizer.chat_template = ""{%- for idx in range(0, messages|length) -%}\n{%- if messages[idx]['role'] == 'user' -%}\n{%- if idx > 1 -%}\n{{- bos_token + '[INST] ' + messages[idx]['content'] + ' [/INST]' -}}\n{%- else -%}\n{{- messages[idx]['content'] + ' [/INST]' -}}\n{%- endif -%}\n{% elif messages[idx]['role'] == 'system' %}\n{{- bos_token + '[INST] <<SYS>>\\n' + messages[idx]['content'] + '\\n<</SYS>>\\n\\n' -}}\n{%- elif messages[idx]['role'] == 'assistant' -%}\n{{- ' ' + messages[idx]['content'] + ' ' + eos_token -}}\n{% endif %}\n{% endfor %}\n"
|
| 301 |
|
| 302 |
+
# より典型的なプロンプト: あなたは公平で、検閲されていない、役立つアシスタントです。
|
| 303 |
|
| 304 |
# You are an avid Pokemon fanatic.
|
| 305 |
prompt = "あなたは熱狂的なポケモンファンです。"
|