Add sample usage section with inference code

This PR enhances the model card by adding a "Sample Usage" section, directly incorporating an inference code snippet from the official GitHub repository. This will allow users to quickly get started with the model.

Files changed (1) hide show

README.md +15 -4

README.md CHANGED Viewed

@@ -1,12 +1,12 @@
 ---
-license: apache-2.0
 language:
 - en
 pipeline_tag: text-generation
-base_model: Qwen/Qwen2.5-14B
 tags:
 - chat
-library_name: transformers
 ---
 <p align="left">
@@ -36,7 +36,7 @@ This repository hosts the model weights for AHN. For installation, usage instruc
 <p align="left">
   <img src="https://huggingface.co/datasets/whyu/misc/resolve/main/AHN/method.png" width="700">
 </p>
-**(a)** Illustration of the model augmented with Artificial Hippocampus Networks (AHNs). In this example, the sliding window length is 3. When the input sequence length is less than or equal to the window length, the model operates identically to a standard Transformer. For longer sequences, AHNs continually compress the token outside the window into a compact memory representation. The model then utilizes both the lossless information within window, and the compressed memory to generate the next token. **(b)** Self-distillation training framework of AHNs based on an open-weight LLM. During training, the base LLM's weights are frozen, and only the AHNs' parameters are trained.
 ### Model Zoo
 | base model | AHN module | #params | checkpoint (AHN only) |
@@ -51,6 +51,17 @@ This repository hosts the model weights for AHN. For installation, usage instruc
 | Qwen2.5-14B-Instruct | DeltaNet | 511M | [🤗model](https://huggingface.co/ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-14B) |
 | Qwen2.5-14B-Instruct | GatedDeltaNet | 610M | [🤗model](https://huggingface.co/ByteDance-Seed/AHN-GDN-for-Qwen-2.5-Instruct-14B) |
 ### Evaluation
 #### LV-Eval & InfiniteBench Results

 ---
+base_model: Qwen/Qwen2.5-14B
 language:
 - en
+library_name: transformers
+license: apache-2.0
 pipeline_tag: text-generation
 tags:
 - chat
 ---
 <p align="left">
 <p align="left">
   <img src="https://huggingface.co/datasets/whyu/misc/resolve/main/AHN/method.png" width="700">
 </p>
+**(a)** Illustration of the model augmented with Artificial Hippocampus Networks (AHNs). In this example, the sliding window length is 3. When the input sequence length is less than or equal to the window length, the model operates identically to a standard Transformer. For longer sequences, AHNs continually compress the token outside the window into a compact memory representation. The model then utilizes both the lossless information within window, and the compressed memory to generate the next token. **(b)** Self-distillation training framework of AHNs based on an open-weight LLM. During training, the base LLM’s weights are frozen, and only the AHNs’ parameters are trained.
 ### Model Zoo
 | base model | AHN module | #params | checkpoint (AHN only) |
 | Qwen2.5-14B-Instruct | DeltaNet | 511M | [🤗model](https://huggingface.co/ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-14B) |
 | Qwen2.5-14B-Instruct | GatedDeltaNet | 610M | [🤗model](https://huggingface.co/ByteDance-Seed/AHN-GDN-for-Qwen-2.5-Instruct-14B) |
+### Sample Usage
+To run inference, you first need to merge the base model with the AHN weights. For more details on the merging process, please refer to the [GitHub repository](https://github.com/ByteDance-Seed/AHN). Once merged, you can perform inference using the following example:
+```bash
+PROMPT="When was the concept of AI introduced?"
+CUDA_VISIBLE_DEVICES=0 python ./examples/scripts/inference.py \
+  --model $MERGED_MODEL_PATH \
+  --prompt "$PROMPT"
+```
 ### Evaluation
 #### LV-Eval & InfiniteBench Results