Nurmukhamed 's Collections
Memory Augmented Language Models through Mixture of Word Experts
Paper
• 2311.10768
• Published
• 19
System 2 Attention (is something you might need too)
Paper
• 2311.11829
• Published
• 43
Fine-tuning Language Models for Factuality
Paper
• 2311.08401
• Published
• 30
Orca 2: Teaching Small Language Models How to Reason
Paper
• 2311.11045
• Published
• 77
Beyond Surface: Probing LLaMA Across Scales and Layers
Paper
• 2312.04333
• Published
• 19
Beyond Human Data: Scaling Self-Training for Problem-Solving with
Language Models
Paper
• 2312.06585
• Published
• 29
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak
Supervision
Paper
• 2312.09390
• Published
• 33
TinyGSM: achieving >80% on GSM8k with small language models
Paper
• 2312.09241
• Published
• 39
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
• 2401.06080
• Published
• 28
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
• 2401.03462
• Published
• 28
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
• 2401.15077
• Published
• 20
A Comprehensive Study of Knowledge Editing for Large Language Models
Paper
• 2401.01286
• Published
• 21
H2O-Danube-1.8B Technical Report
Paper
• 2401.16818
• Published
• 18
Weaver: Foundation Models for Creative Writing
Paper
• 2401.17268
• Published
• 45
Paper
• 2401.04088
• Published
• 160
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
• 2401.02038
• Published
• 65
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
• 2401.16380
• Published
• 51
Self-Rewarding Language Models
Paper
• 2401.10020
• Published
• 152
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published
• 95
Improving Text Embeddings with Large Language Models
Paper
• 2401.00368
• Published
• 82
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context
Learning
Paper
• 2312.01552
• Published
• 32
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper
• 2401.02412
• Published
• 38
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
• 2408.11796
• Published
• 58
Building and better understanding vision-language models: insights and
future directions
Paper
• 2408.12637
• Published
• 133