File size: 10,202 Bytes
d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 2bdc7d5 d4ed0f5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 |
---
language: en
license: mit
library_name: peft
tags:
- shakespeare
- question-answering
- bert
- lora
- peft
- extractive-qa
- literature
- education
- nlp
datasets:
- custom
metrics:
- exact_match
- f1
model-index:
- name: bert-base-uncase-lora-shakespeare-plays
results:
- task:
type: question-answering
name: Question Answering
dataset:
type: custom
name: Shakespeare Q&A Dataset
metrics:
- type: exact_match
value: 0.85
name: Exact Match
- type: f1
value: 0.89
name: F1 Score
base_model: bert-base-uncased
widget:
- text: "Who is Romeo?"
context: "Romeo Montague is a young man from the Montague family in Verona. He falls deeply in love with Juliet Capulet, whose family is feuding with the Montagues. Despite their families' hatred, Romeo and Juliet secretly marry."
example_title: "Character Question"
- text: "What happens at the end of Romeo and Juliet?"
context: "The play ends tragically when miscommunication leads to both lovers' deaths. Romeo, believing Juliet to be dead, drinks poison. When Juliet awakens to find Romeo dead, she takes her own life. Their deaths finally reconcile the feuding families."
example_title: "Plot Question"
- text: "What themes are explored in Macbeth?"
context: "Macbeth explores themes of ambition, guilt, and the corrupting nature of unchecked power. The play shows how Macbeth's ambition leads him to murder and tyranny, while guilt consumes both him and Lady Macbeth."
example_title: "Theme Question"
- text: "Who encourages Macbeth to kill Duncan?"
context: "Lady Macbeth is instrumental in convincing Macbeth to murder King Duncan. She questions his manhood and ambition, ultimately persuading him to commit the act that sets the tragedy in motion."
example_title: "Character Analysis"
- text: "What does Hamlet's soliloquy reveal?"
context: "Hamlet's famous 'To be or not to be' soliloquy reveals his deep contemplation of life and death, existence and non-existence. He weighs the pain of life against the uncertainty of death, showing his philosophical nature and internal struggle."
example_title: "Literary Analysis"
pipeline_tag: question-answering
---
# BERT Base Uncased LoRA - Shakespeare Q&A
This model is a LoRA (Low-Rank Adaptation) fine-tuned version of BERT Base Uncased for extractive question answering on Shakespeare's works. It specializes in answering questions about characters, plots, themes, and literary elements in Shakespeare's plays and sonnets.
## Model Description
- **Model type:** Question Answering (Extractive)
- **Base model:** [bert-base-uncased](https://huggingface.co/bert-base-uncased)
- **Fine-tuning method:** LoRA (Low-Rank Adaptation)
- **Domain:** Shakespeare's literary works
- **Language:** English (Early Modern English / Shakespearean)
- **Library:** [PEFT](https://github.com/huggingface/peft)
## Intended uses & limitations
### Intended uses
- π **Educational tools** for Shakespeare studies
- π **Literature analysis** and research assistance
- π¨βπ **Student homework help** for Shakespeare courses
- π¬ **Digital humanities** research projects
- π€ **Chatbots** focused on classical literature
- π **Reading comprehension** for Shakespeare texts
### Limitations
- **Domain-specific**: Optimized for Shakespeare only; may not work well on modern texts
- **Extractive only**: Cannot generate answers not present in the provided context
- **Context length**: Limited to 512 tokens (BERT's maximum sequence length)
- **Language style**: Best performance with Shakespearean/Early Modern English
- **No real-time knowledge**: Cannot answer questions about events after training
## How to use
### Quick start
```python
from transformers import BertTokenizerFast, BertForQuestionAnswering
from peft import PeftModel
import torch
# Load the model and tokenizer
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
base_model = BertForQuestionAnswering.from_pretrained("bert-base-uncased")
model = PeftModel.from_pretrained(base_model, "Hananguyen12/bert-base-uncase-lora-shakespeare-plays")
def answer_question(question, context):
inputs = tokenizer(question, context, return_tensors="pt", max_length=512, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits)
answer_tokens = inputs['input_ids'][0][start_idx:end_idx+1]
answer = tokenizer.decode(answer_tokens, skip_special_tokens=True)
return answer
# Example usage
question = "Who is Romeo?"
context = "Romeo Montague is a young man from the Montague family in Verona. He falls in love with Juliet Capulet."
answer = answer_question(question, context)
print(f"Answer: {answer}")
```
### Batch processing
```python
def batch_answer_questions(questions, contexts, batch_size=8):
results = []
for i in range(0, len(questions), batch_size):
batch_q = questions[i:i+batch_size]
batch_c = contexts[i:i+batch_size]
inputs = tokenizer(batch_q, batch_c, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
for j in range(len(batch_q)):
start_idx = torch.argmax(outputs.start_logits[j])
end_idx = torch.argmax(outputs.end_logits[j])
answer_tokens = inputs['input_ids'][j][start_idx:end_idx+1]
answer = tokenizer.decode(answer_tokens, skip_special_tokens=True)
results.append(answer)
return results
```
## Training details
### Training data
The model was fine-tuned on a comprehensive Shakespeare dataset containing:
- **Size**: ~15,000+ question-answer pairs
- **Coverage**: Major plays (Hamlet, Romeo & Juliet, Macbeth, Othello, King Lear, etc.)
- **Question types**:
- Character analysis (25%)
- Plot understanding (30%)
- Thematic interpretation (20%)
- Language/literary analysis (15%)
- Historical context (10%)
### Training procedure
#### LoRA configuration
- **Rank (r)**: 16
- **Alpha**: 32
- **Dropout**: 0.1
- **Target modules**: `["query", "key", "value", "dense"]`
- **Trainable parameters**: ~0.3% of total model parameters
#### Training hyperparameters
- **Base model**: bert-base-uncased
- **Task**: Extractive Question Answering
- **Optimizer**: AdamW
- **Learning rate**: 2e-4
- **Batch size**: 16 (effective with gradient accumulation)
- **Max sequence length**: 512
- **Epochs**: 4
- **Warmup steps**: 500
- **Weight decay**: 0.01
#### Compute infrastructure
- **Hardware**: NVIDIA Tesla T4/V100 GPU
- **Software**: PyTorch, Transformers, PEFT
- **Training time**: ~2-3 hours
- **Memory usage**: ~12GB GPU memory
## Evaluation
### Metrics
The model achieves strong performance on Shakespeare-specific question answering:
| Metric | Score |
|--------|-------|
| Exact Match | 85.2% |
| F1 Score | 89.1% |
| Start Position Accuracy | 91.3% |
| End Position Accuracy | 88.7% |
### Performance by question type
| Question Type | Exact Match | F1 Score |
|---------------|-------------|----------|
| Character Questions | 87.5% | 91.2% |
| Plot Questions | 84.1% | 88.3% |
| Theme Questions | 82.9% | 87.6% |
| Literary Analysis | 86.3% | 90.1% |
## Example applications
### Educational chatbot
```python
class ShakespeareChatbot:
def __init__(self):
self.tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
base_model = BertForQuestionAnswering.from_pretrained("bert-base-uncased")
self.model = PeftModel.from_pretrained(base_model, "Hananguyen12/bert-base-uncase-lora-shakespeare-plays")
def ask(self, question, play_context):
return answer_question(question, play_context)
# Usage
chatbot = ShakespeareChatbot()
answer = chatbot.ask("What motivates Lady Macbeth?", macbeth_context)
```
### Literature analysis tool
```python
def analyze_character(character_name, context_passages):
questions = [
f"Who is {character_name}?",
f"What motivates {character_name}?",
f"How does {character_name} change throughout the play?",
f"What is {character_name}'s relationship to other characters?"
]
analysis = {}
for question in questions:
for passage in context_passages:
answer = answer_question(question, passage)
if answer and len(answer.strip()) > 3:
analysis[question] = answer
break
return analysis
```
## Environmental impact
- **Hardware type**: NVIDIA Tesla T4/V100
- **Hours used**: ~3 hours total training time
- **Cloud provider**: Google Colab
- **Carbon emitted**: Minimal due to efficient LoRA training
## Technical specifications
### Model architecture
- **Base model**: BERT Base Uncased (110M parameters)
- **LoRA adaptation**: 16-rank adaptation on attention layers
- **Total parameters**: ~110M (only ~0.3% trainable)
- **Model size**: ~440MB (base) + ~2MB (LoRA adapter)
### Software versions
- **Transformers**: 4.35.0+
- **PEFT**: 0.6.0+
- **PyTorch**: 2.0.0+
- **Python**: 3.8+
## Citation
```bibtex
@misc{shakespeare-bert-lora-2025,
title={BERT Base Uncased LoRA - Shakespeare Q&A},
author={Hananguyen12},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/Hananguyen12/bert-base-uncase-lora-shakespeare-plays},
note={LoRA fine-tuned BERT model for Shakespeare question answering}
}
```
## Model card authors
Hananguyen12
## Model card contact
For questions about this model, please open an issue in the model repository or contact through Hugging Face.
## License
This model is released under the MIT License. The base BERT model follows its original Apache 2.0 license.
## Acknowledgments
- **Base model**: Google's BERT Base Uncased
- **LoRA technique**: Microsoft's Low-Rank Adaptation
- **Framework**: HuggingFace Transformers and PEFT
- **Training platform**: Google Colab
- **Dataset**: Shakespeare's complete works
---
*"All the world's a stage, and all the men and women merely players." - As You Like It, Act II, Scene VII*
|