WeDLM
Collection
4 items
โข
Updated
โข
6
WeDLM-7B-Instruct is an instruction-tuned diffusion language model that performs parallel decoding under standard causal attention, fine-tuned from WeDLM-7B.
For the base (pretrained) version, see WeDLM-7B.
๐ Paper (Coming Soon) | ๐ Project Page | ๐ป GitHub
| Attribute | Value |
|---|---|
| Base Model | WeDLM-7B |
| Parameters | 7B |
| Context Length | 32,768 |
For fast inference, use the wedlm engine:
pip install git+https://github.com/tencent/WeDLM.git
from transformers import AutoTokenizer
from wedlm import LLM, SamplingParams
llm = LLM(model="tencent/WeDLM-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True)
prompt = "Explain the difference between machine learning and deep learning."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate([text], SamplingParams(temperature=0.2, max_tokens=512))
print(outputs[0]["text"])
messages = [
{"role": "user", "content": "What is Python?"},
{"role": "assistant", "content": "Python is a high-level programming language known for its simplicity and readability."},
{"role": "user", "content": "Show me a hello world example."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate([text], SamplingParams(temperature=0.2, max_tokens=256))
For training or simple forward passes:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"tencent/WeDLM-7B-Instruct",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model(**inputs)
โ ๏ธ Note: The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the
wedlmengine above.
| Benchmark | Qwen2.5-7B-Instruct | WeDLM-7B-Instruct |
|---|---|---|
| ARC-C (0-shot) | 86.09 | 89.59 |
| GSM8K (3-shot) | 89.91 | 87.57 |
| MATH (4-shot) | 45.00 | 55.40 |
| HumanEval (4-shot) | 76.22 | 75.00 |
| MMLU (5-shot) | 71.98 | 70.52 |
Apache 2.0