Collections
Discover the best community collections!
Collections including paper arxiv:2407.08250
-
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19 -
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Paper • 2403.20275 • Published • 10 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Paper • 2404.03820 • Published • 26
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 10 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
-
Diffusion World Model
Paper • 2402.03570 • Published • 8 -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Paper • 2401.16335 • Published • 1 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper • 2402.07319 • Published • 14
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 10 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
-
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19 -
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Paper • 2403.20275 • Published • 10 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Paper • 2404.03820 • Published • 26
-
Diffusion World Model
Paper • 2402.03570 • Published • 8 -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Paper • 2401.16335 • Published • 1 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper • 2402.07319 • Published • 14