| Quantization made by Richard Erkhov. | |
| [Github](https://github.com/RichardErkhov) | |
| [Discord](https://discord.gg/pvy7H8DZMG) | |
| [Request more models](https://github.com/RichardErkhov/quant_request) | |
| THaLLE-0.1-7B-fa - GGUF | |
| - Model creator: https://huggingface.co/KBTG-Labs/ | |
| - Original model: https://huggingface.co/KBTG-Labs/THaLLE-0.1-7B-fa/ | |
| | Name | Quant method | Size | | |
| | ---- | ---- | ---- | | |
| | [THaLLE-0.1-7B-fa.Q2_K.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q2_K.gguf) | Q2_K | 2.81GB | | |
| | [THaLLE-0.1-7B-fa.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.IQ3_XS.gguf) | IQ3_XS | 3.12GB | | |
| | [THaLLE-0.1-7B-fa.IQ3_S.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.IQ3_S.gguf) | IQ3_S | 3.26GB | | |
| | [THaLLE-0.1-7B-fa.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q3_K_S.gguf) | Q3_K_S | 3.25GB | | |
| | [THaLLE-0.1-7B-fa.IQ3_M.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.IQ3_M.gguf) | IQ3_M | 3.33GB | | |
| | [THaLLE-0.1-7B-fa.Q3_K.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q3_K.gguf) | Q3_K | 3.55GB | | |
| | [THaLLE-0.1-7B-fa.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q3_K_M.gguf) | Q3_K_M | 3.55GB | | |
| | [THaLLE-0.1-7B-fa.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q3_K_L.gguf) | Q3_K_L | 3.81GB | | |
| | [THaLLE-0.1-7B-fa.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.IQ4_XS.gguf) | IQ4_XS | 3.96GB | | |
| | [THaLLE-0.1-7B-fa.Q4_0.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q4_0.gguf) | Q4_0 | 4.13GB | | |
| | [THaLLE-0.1-7B-fa.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.IQ4_NL.gguf) | IQ4_NL | 4.16GB | | |
| | [THaLLE-0.1-7B-fa.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q4_K_S.gguf) | Q4_K_S | 4.15GB | | |
| | [THaLLE-0.1-7B-fa.Q4_K.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q4_K.gguf) | Q4_K | 4.36GB | | |
| | [THaLLE-0.1-7B-fa.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q4_K_M.gguf) | Q4_K_M | 4.36GB | | |
| | [THaLLE-0.1-7B-fa.Q4_1.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q4_1.gguf) | Q4_1 | 4.54GB | | |
| | [THaLLE-0.1-7B-fa.Q5_0.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q5_0.gguf) | Q5_0 | 4.95GB | | |
| | [THaLLE-0.1-7B-fa.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q5_K_S.gguf) | Q5_K_S | 4.95GB | | |
| | [THaLLE-0.1-7B-fa.Q5_K.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q5_K.gguf) | Q5_K | 5.07GB | | |
| | [THaLLE-0.1-7B-fa.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q5_K_M.gguf) | Q5_K_M | 5.07GB | | |
| | [THaLLE-0.1-7B-fa.Q5_1.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q5_1.gguf) | Q5_1 | 5.36GB | | |
| | [THaLLE-0.1-7B-fa.Q6_K.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q6_K.gguf) | Q6_K | 5.82GB | | |
| | [THaLLE-0.1-7B-fa.Q8_0.gguf](https://huggingface.co/RichardErkhov/KBTG-Labs_-_THaLLE-0.1-7B-fa-gguf/blob/main/THaLLE-0.1-7B-fa.Q8_0.gguf) | Q8_0 | 7.54GB | | |
| Original model description: | |
| --- | |
| license: apache-2.0 | |
| pipeline_tag: text-generation | |
| language: | |
| - en | |
| tags: | |
| - finance | |
| --- | |
| # THaLLE: Text Hyperlocally Augmented Large Language Extension | |
| **❗NOTICE❗**: `KBTG-Labs/THaLLE-0.1-7B-fa` is a WIP model checkpoint distributed for reproducing results in our [Technical Report](https://arxiv.org/abs/2406.07505). | |
| ## Training details | |
| This model is a [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) fine-tuned on our Internal CFA Mock Exam 2009-2019 containing 9,426 Questions using LoRA. | |
| ### Vocab Config Patching | |
| Prior to training, we patched Qwen/Qwen2-7B-Instruct's `tokenizer_config.json` `bos_token` field from `null` to the start token `"<|im_start|>"`. | |
| ```json | |
| { | |
| ... | |
| "bos_token": "<|im_start|>" | |
| ... | |
| } | |
| ``` | |
| ## Results | |
| For more details see our [Technical Report](https://arxiv.org/abs/2406.07505). | |
| | Model | Internal 2020 | Internal 2024 | Flare CFA* | | |
| | --------------------------------------- | ------------- | ------------- | ---------- | | |
| | APIs | | | | | |
| | `gpt-3.5-turbo-0125` | 0.5458 | 0.5027 | 0.6366 | | |
| | `gemini-1.5-flash-001` | 0.6271 | 0.6278 | 0.7355 | | |
| | `gemini-1.5-pro-001` | 0.6780 | 0.6444 | 0.7829 | | |
| | `gpt-4o-2024-05-13` | **0.8000** | **0.8055** | **0.8789** | | |
| | HF models | | | | | |
| | `"meta-llama/Llama-2-7b-chat-hf"` | 0.3774 | 0.3639 | 0.4264 | | |
| | `"google/gemma-7b-it"` | 0.5107 | 0.5333 | 0.6027 | | |
| | `"meta-llama/Meta-Llama-3-8B-Instruct"` | 0.5424 | 0.5222 | 0.6386 | | |
| | `"Qwen/Qwen2-7B-Instruct"` | 0.5740 | 0.5583 | 0.6831 | | |
| | `"KBTG-Labs/THaLLE-0.1-7B-fa"` | **0.6678** | **0.6500** | **0.7171** | | |
| [*] Flare CFA is `"ChanceFocus/flare-cfa"` | |
| ## Usage | |
| ### Requirements | |
| Since `KBTG-Labs/THaLLE-0.1-7B-fa` is a fine-tuned of Qwen2-7B-Instruct you will need to install `transformers>=4.37.0`. | |
| ### Reproducing results | |
| Running the script below should give you this output: | |
| ``` | |
| Progress: 1032/1032 | Correct: 740 (71.71%) | |
| ``` | |
| ```python | |
| import re | |
| from typing import Literal, Optional | |
| import torch | |
| from datasets import load_dataset | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| MODEL_ID: str = "KBTG-Labs/THaLLE-0.1-7B-fa" | |
| SYSTEM_PROMPT: str = """You are a CFA (chartered financial analyst) taking a test to evaluate your knowledge of finance. You will be given a question along with three possible answers (A, B, and C). | |
| Indicate the correct answer (A, B, or C).""" | |
| QUESTION_TEMPLATE: str = """Question: | |
| {question} | |
| A. {choice_a} | |
| B. {choice_b} | |
| C. {choice_c}""" | |
| def format_flare_cfa(text: str) -> dict[str, str]: | |
| text = re.sub(r"\s+", " ", text) | |
| pattern = r"Q:\s*(.*?),\s*CHOICES:\s*A:\s*(.*?),\s*B:\s*(.*?),\s*C:\s*(.*)" | |
| match = re.search(pattern, text) | |
| if match: | |
| question, choice_a, choice_b, choice_c = match.groups() | |
| return { | |
| "question": question.strip(), | |
| "choice_a": choice_a.strip(), | |
| "choice_b": choice_b.strip(), | |
| "choice_c": choice_c.strip(), | |
| } | |
| else: | |
| raise ValueError("Input text does not match the expected format.") | |
| def load_benchmark_dataset() -> list[dict[str, str]]: | |
| dataset = load_dataset("ChanceFocus/flare-cfa")["test"] | |
| prepared_dataset = [] | |
| for d in dataset: | |
| entry = format_flare_cfa(d["text"]) | |
| entry["answer"] = str(d["answer"]).upper() | |
| prepared_dataset.append(entry) | |
| return prepared_dataset | |
| def extract_choice( | |
| response_text: str, choice_a: str, choice_b: str, choice_c: str | |
| ) -> Optional[Literal["A", "B", "C"]]: | |
| def clean(text: str) -> str: | |
| return text.replace("–", "-").strip().replace("\n", "") | |
| find_choice = re.findall( | |
| r"([T|t]he correct answer is[.|:]? [ABC]|[A|a]nswer[.|:]?[is]?\W+?\n?[ABC]\s)", | |
| response_text, | |
| ) | |
| if find_choice: | |
| return clean(find_choice[0])[-1] | |
| if len(response_text) == 1 and response_text in "ABC": | |
| return response_text | |
| find_choice = re.findall(r"[ABC][.]\s?", response_text) | |
| if find_choice: | |
| return find_choice[0][0] | |
| choice = {"A": choice_a, "B": choice_b, "C": choice_c} | |
| for ch, content in choice.items(): | |
| if clean(content) in clean(response_text): | |
| return ch | |
| return None | |
| def inference(messages: list[dict[str, str]], model, tokenizer) -> str: | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| ) | |
| model_inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| generated_ids = model.generate( | |
| model_inputs.input_ids, | |
| max_new_tokens=768, | |
| do_sample=False, | |
| temperature=None, | |
| top_p=None, | |
| top_k=None, | |
| ) | |
| generated_ids = [ | |
| output_ids[len(input_ids) :] | |
| for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) | |
| ] | |
| response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| return response | |
| def run_benchmark(dataset: list[dict[str, str]], model, tokenizer): | |
| total_correct = 0 | |
| for i, problem in enumerate(dataset, start=1): | |
| messages = [ | |
| {"role": "system", "content": SYSTEM_PROMPT}, | |
| {"role": "user", "content": QUESTION_TEMPLATE.format(**problem)}, | |
| ] | |
| output_text = inference(messages, model, tokenizer) | |
| prediction = extract_choice( | |
| output_text, | |
| problem["choice_a"], | |
| problem["choice_b"], | |
| problem["choice_c"], | |
| ) | |
| correct = problem["answer"] == prediction | |
| total_correct += correct | |
| percent = total_correct / i * 100 | |
| print( | |
| f"Progress: {i}/{len(dataset)} | Correct: {total_correct} ({percent:.2f}%)", | |
| end="\r", | |
| ) | |
| if __name__ == "__main__": | |
| dataset = load_benchmark_dataset() | |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| MODEL_ID, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| run_benchmark(dataset, model, tokenizer) | |
| ``` | |
| ## Citation | |
| If you find our work useful, please cite: | |
| ``` | |
| @misc{labs2024thalle, | |
| title={THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report}, | |
| author={KBTG Labs and Danupat Khamnuansin and Atthakorn Petchsod and Anuruth Lertpiya and Pornchanan Balee and Thanawat Lodkaew and Tawunrat Chalothorn and Thadpong Pongthawornkamol and Monchai Lertsutthiwong}, | |
| year={2024}, | |
| eprint={2406.07505}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL} | |
| } | |
| ``` | |