dwetzel
/

watt-tool-70B-GPTQ-INT4

@@ -1,199 +1,195 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: apache-2.0
+language:
+- en
+base_model:
+- meta-llama/Llama-3.3-70B-Instruct
+tags:
+- function-calling
+- tool-use
+- llama
+- bfcl
 ---
+# QUANTIZATION INFORMATION
+This model was quantized using the [llm-compressor](https://github.com/vllm-project/llm-compressor) library from the vLLM team.
+The calibration dataset was [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) with a sequence length of `4096` and a sample size of `1024`
+The quantiation scheme is `W4A16` with the `lm_head` ignored.
+Further Parameters were the llm-compressor defaults.
+## QUANTIZATION CODE
+The following code was used to quantize this model:
+#### LOADING THE MODEL:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+MODEL_ID = "watt-ai/watt-tool-70B"
+# Load model with better memory management
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_ID,
+    device_map="auto",
+    torch_dtype=torch.bfloat16,
+)
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
+```
+#### LOADING THE DATASET:
+```python
+from datasets import load_dataset
+NUM_CALIBRATION_SAMPLES=1024
+MAX_SEQUENCE_LENGTH=4096
+# Load dataset.
+ds = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft")
+ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))
+# Preprocess the data into the format the model is trained with.
+def preprocess(example):
+    return {"text": tokenizer.apply_chat_template(example["messages"], tokenize=False,)}
+ds = ds.map(preprocess)
+# Tokenize the data (be careful with bos tokens - we need add_special_tokens=False since the chat_template already added it).
+def tokenize(sample):
+    return tokenizer(sample["text"], padding=False, max_length=MAX_SEQUENCE_LENGTH, truncation=True, add_special_tokens=False)
+ds = ds.map(tokenize, remove_columns=ds.column_names)
+```
+#### QUANTIZING THE MODEL:
+```python
+from llmcompressor.transformers import oneshot
+from llmcompressor.modifiers.quantization import GPTQModifier
+# Configure the quantization algorithm to run.
+recipe = GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"], dampening_frac=0.1)
+# Apply quantization.
+oneshot(
+    model=model, dataset=ds,
+    recipe=recipe,
+    max_seq_length=MAX_SEQUENCE_LENGTH,
+    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
+)
+# Save to disk compressed.
+SAVE_DIR = "models/" + MODEL_ID.split("/")[1] + "-GPTQ-INT4"
+model.save_pretrained(SAVE_DIR, max_shard_size="4GB")
+tokenizer.save_pretrained(SAVE_DIR)
+```
+------
+# watt-tool-70B
+watt-tool-70B is a fine-tuned language model based on LLaMa-3.3-70B-Instruct, optimized for tool usage and multi-turn dialogue. It achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL).
+## Model Description
+This model is specifically designed to excel at complex tool usage scenarios that require multi-turn interactions, making it ideal for empowering platforms like [Lupan](https://lupan.watt.chat), an AI-powered workflow building tool. By leveraging a carefully curated and optimized dataset, watt-tool-70B demonstrates superior capabilities in understanding user requests, selecting appropriate tools, and effectively utilizing them across multiple turns of conversation.
+Target Application: AI Workflow Building as in [https://lupan.watt.chat/](https://lupan.watt.chat/) and [Coze](https://www.coze.com/).
+## Key Features
+*   **Enhanced Tool Usage:** Fine-tuned for precise and efficient tool selection and execution.
+*   **Multi-Turn Dialogue:** Optimized for maintaining context and effectively utilizing tools across multiple turns of conversation, enabling more complex task completion.
+*   **State-of-the-Art Performance:** Achieves top performance on the BFCL, demonstrating its capabilities in function calling and tool usage.
+*   **Based on LLaMa-3.1-70B-Instruct:** Inherits the strong language understanding and generation capabilities of the base model.
+## Training Methodology
+watt-tool-70B is trained using supervised fine-tuning on a specialized dataset designed for tool usage and multi-turn dialogue. We use CoT techniques to synthesize high-quality multi-turn dialogue data.
+The training process is inspired by the principles outlined in the paper: ["Direct Multi-Turn Preference Optimization for Language Agents"](https://arxiv.org/abs/2406.14868).
+We use SFT and DMPO to further enhance the model's performance in multi-turn agent tasks.
+## How to Use
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "watt-ai/watt-tool-70B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype='auto', device_map="auto")
+# Example usage (adapt as needed for your specific tool usage scenario)
+"""You are an expert in composing functions. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
+If none of the function can be used, point it out. If the given question lacks the parameters required by the function, also point it out.
+You should only return the function call in tools call sections.
+If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]
+You SHOULD NOT include any other text in the response.
+Here is a list of functions in JSON format that you can invoke.\n{functions}\n
+"""
+# User query
+query = "Find me the sales growth rate for company XYZ for the last 3 years and also the interest coverage ratio for the same duration."
+tools = [
+    {
+        "name": "financial_ratios.interest_coverage", "description": "Calculate a company's interest coverage ratio given the company name and duration",
+        "arguments": {
+            "type": "dict",
+            "properties": {
+                "company_name": {
+                    "type": "string",
+                    "description": "The name of the company."
+                },
+                "years": {
+                    "type": "integer",
+                    "description": "Number of past years to calculate the ratio."
+                }
+            },
+            "required": ["company_name", "years"]
+        }
+    },
+    {
+        "name": "sales_growth.calculate",
+        "description": "Calculate a company's sales growth rate given the company name and duration",
+        "arguments": {
+            "type": "dict",
+            "properties": {
+                "company": {
+                    "type": "string",
+                    "description": "The company that you want to get the sales growth rate for."
+                },
+                "years": {
+                    "type": "integer",
+                    "description": "Number of past years for which to calculate the sales growth rate."
+                }
+            },
+            "required": ["company", "years"]
+        }
+    },
+    {
+        "name": "weather_forecast",
+        "description": "Retrieve a weather forecast for a specific location and time frame.",
+        "arguments": {
+            "type": "dict",
+            "properties": {
+                "location": {
+                    "type": "string",
+                    "description": "The city that you want to get the weather for."
+                },
+                "days": {
+                    "type": "integer",
+                    "description": "Number of days for the forecast."
+                }
+            },
+            "required": ["location", "days"]
+        }
+    }
+]
+messages = [
+    {'role': 'system', 'content': system_prompt.format(functions=tools)},
+    {'role': 'user', 'content': query}
+]
+inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
+outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
+print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))