Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.08250

Reinforcement Learning

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - RL - Structured Data - Gradient Boosting

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - RL - Actor-Critic

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - RL - Gradient-Boosting

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - Nvidia

LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 19
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

Paper • 2403.20275 • Published Mar 29, 2024 • 10
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1, 2024 • 13
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Paper • 2404.03820 • Published Apr 4, 2024 • 26

Papers - XGBoost

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13
Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published Aug 28, 2024 • 10

Papers - RL - GBT vs GBRL vs XGBoost

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - Markov Decision Process

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Reinforcement Learning (RL / RLHF)

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71
Understanding and Diagnosing Deep Reinforcement Learning

Paper • 2406.16979 • Published Jun 23, 2024 • 10
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 62
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7

Diffusion World Model

Paper • 2402.03570 • Published Feb 5, 2024 • 8
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Paper • 2401.16335 • Published Jan 29, 2024 • 1
Towards Efficient and Exact Optimization of Language Model Alignment

Paper • 2402.00856 • Published Feb 1, 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF

Paper • 2402.07319 • Published Feb 11, 2024 • 14

Reinforcement Learning

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - XGBoost

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13
Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published Aug 28, 2024 • 10

Papers - RL - Structured Data - Gradient Boosting

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - RL - GBT vs GBRL vs XGBoost

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - RL - Actor-Critic

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - Markov Decision Process

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Papers - RL - Gradient-Boosting

Gradient Boosting Reinforcement Learning

Paper • 2407.08250 • Published Jul 11, 2024 • 13

Reinforcement Learning (RL / RLHF)

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71
Understanding and Diagnosing Deep Reinforcement Learning

Paper • 2406.16979 • Published Jun 23, 2024 • 10
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 62
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7

Papers - Nvidia

LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 19
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

Paper • 2403.20275 • Published Mar 29, 2024 • 10
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1, 2024 • 13
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Paper • 2404.03820 • Published Apr 4, 2024 • 26

Diffusion World Model

Paper • 2402.03570 • Published Feb 5, 2024 • 8
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Paper • 2401.16335 • Published Jan 29, 2024 • 1
Towards Efficient and Exact Optimization of Language Model Alignment

Paper • 2402.00856 • Published Feb 1, 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF

Paper • 2402.07319 • Published Feb 11, 2024 • 14

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs