lhl PRO
leonardlin
·
AI & ML interests
None yet
Recent Activity
liked
a model
about 22 hours ago
MiniMaxAI/MiniMax-M2
liked
a dataset
10 days ago
muellerzr/consumer-mamf
Organizations
quantize
-
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Paper • 2307.13304 • Published • 2 -
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Paper • 2306.03078 • Published • 3 -
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Paper • 2308.13137 • Published • 18 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 11
sota
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
tuning
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 14 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 60 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27
context
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 77 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16
image
-
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77 -
HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors
Paper • 2408.06019 • Published • 15 -
Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields
Paper • 2408.03822 • Published • 14
interprebility
code
-
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 46 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 83 -
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
Paper • 2312.13010 • Published • 6 -
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Paper • 2403.07974 • Published • 3
embedding
TOREAD
-
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
Instruction Tuning with Human Curriculum
Paper • 2310.09518 • Published • 3 -
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Paper • 2312.05934 • Published • 1 -
Language Models as Agent Models
Paper • 2212.01681 • Published
synthetic-data
-
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31 -
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Paper • 2304.12244 • Published • 13 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 50
Open LLMs
-
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Paper • 2405.19327 • Published • 48 -
LLM360/K2
Text Generation • 65B • Updated • 387 • 94 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper • 2312.06550 • Published • 57
voice
-
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Paper • 2010.08136 • Published • 1 -
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Paper • 2402.01692 • Published • 1 -
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
Paper • 2008.00768 • Published • 1 -
Running on T4208208
MassivelyMultilingualTTS
🌍Generate speech from text in multiple languages
speed
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper • 2312.12456 • Published • 44 -
Accelerating LLM Inference with Staged Speculative Decoding
Paper • 2308.04623 • Published • 25 -
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 5
multilingual
-
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
YAYI 2: Multilingual Open-Source Large Language Models
Paper • 2312.14862 • Published • 15 -
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Paper • 2312.06134 • Published • 3 -
TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes
Paper • 2311.10797 • Published
evals
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Paper • 2305.01210 • Published • 3 -
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Paper • 2309.06495 • Published • 1 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 37
rag
-
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Paper • 2305.14292 • Published -
Harnessing Retrieval-Augmented Generation (RAG) for Uncovering Knowledge Gaps
Paper • 2312.07796 • Published -
RAGAS: Automated Evaluation of Retrieval Augmented Generation
Paper • 2309.15217 • Published • 4 -
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering
Paper • 2210.02627 • Published
safety
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 29 -
Weak-to-Strong Jailbreaking on Large Language Models
Paper • 2401.17256 • Published • 16 -
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Paper • 2401.17263 • Published • 1 -
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming
Paper • 2311.06237 • Published • 1
reasoning
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 31 -
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 18
vision
Prompting
-
The Unreasonable Effectiveness of Eccentric Automatic Prompts
Paper • 2402.10949 • Published • 5 -
State of What Art? A Call for Multi-Prompt LLM Evaluation
Paper • 2401.00595 • Published • 2 -
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Paper • 2312.16171 • Published • 37 -
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
Paper • 2401.05618 • Published • 1
prompt injection
-
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Paper • 2401.17263 • Published • 1 -
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Paper • 2312.17673 • Published • 1 -
Prompt Injection Attacks and Defenses in LLM-Integrated Applications
Paper • 2310.12815 • Published • 1 -
Prompt Injection attack against LLM-integrated Applications
Paper • 2306.05499 • Published • 1
architecture
-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 47 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 39 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56
multimodal
data
-
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 346k • 787 -
allenai/MADLAD-400
Updated • 9.11k • 149
8b-class-japanese-models
speed
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper • 2312.12456 • Published • 44 -
Accelerating LLM Inference with Staged Speculative Decoding
Paper • 2308.04623 • Published • 25 -
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 5
quantize
-
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Paper • 2307.13304 • Published • 2 -
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Paper • 2306.03078 • Published • 3 -
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Paper • 2308.13137 • Published • 18 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 11
multilingual
-
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
YAYI 2: Multilingual Open-Source Large Language Models
Paper • 2312.14862 • Published • 15 -
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Paper • 2312.06134 • Published • 3 -
TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes
Paper • 2311.10797 • Published
sota
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
evals
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Paper • 2305.01210 • Published • 3 -
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Paper • 2309.06495 • Published • 1 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 37
tuning
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 14 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 60 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27
rag
-
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Paper • 2305.14292 • Published -
Harnessing Retrieval-Augmented Generation (RAG) for Uncovering Knowledge Gaps
Paper • 2312.07796 • Published -
RAGAS: Automated Evaluation of Retrieval Augmented Generation
Paper • 2309.15217 • Published • 4 -
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering
Paper • 2210.02627 • Published
context
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 77 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16
safety
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 29 -
Weak-to-Strong Jailbreaking on Large Language Models
Paper • 2401.17256 • Published • 16 -
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Paper • 2401.17263 • Published • 1 -
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming
Paper • 2311.06237 • Published • 1
image
-
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77 -
HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors
Paper • 2408.06019 • Published • 15 -
Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields
Paper • 2408.03822 • Published • 14
reasoning
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 31 -
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 18
interprebility
vision
code
-
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 46 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 83 -
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
Paper • 2312.13010 • Published • 6 -
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Paper • 2403.07974 • Published • 3
Prompting
-
The Unreasonable Effectiveness of Eccentric Automatic Prompts
Paper • 2402.10949 • Published • 5 -
State of What Art? A Call for Multi-Prompt LLM Evaluation
Paper • 2401.00595 • Published • 2 -
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Paper • 2312.16171 • Published • 37 -
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
Paper • 2401.05618 • Published • 1
embedding
prompt injection
-
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Paper • 2401.17263 • Published • 1 -
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Paper • 2312.17673 • Published • 1 -
Prompt Injection Attacks and Defenses in LLM-Integrated Applications
Paper • 2310.12815 • Published • 1 -
Prompt Injection attack against LLM-integrated Applications
Paper • 2306.05499 • Published • 1
TOREAD
-
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
Instruction Tuning with Human Curriculum
Paper • 2310.09518 • Published • 3 -
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Paper • 2312.05934 • Published • 1 -
Language Models as Agent Models
Paper • 2212.01681 • Published
architecture
-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 47 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 39 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56
synthetic-data
-
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31 -
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Paper • 2304.12244 • Published • 13 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 50
multimodal
Open LLMs
-
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Paper • 2405.19327 • Published • 48 -
LLM360/K2
Text Generation • 65B • Updated • 387 • 94 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper • 2312.06550 • Published • 57
data
-
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 346k • 787 -
allenai/MADLAD-400
Updated • 9.11k • 149
voice
-
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Paper • 2010.08136 • Published • 1 -
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Paper • 2402.01692 • Published • 1 -
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
Paper • 2008.00768 • Published • 1 -
Running on T4208208
MassivelyMultilingualTTS
🌍Generate speech from text in multiple languages