·
AI & ML interests
RL
Organizations
None yet
AlisonWen/rm-seed-42-poisoned-targeted-flipped-weqweadas-dataset-29K
Updated
AlisonWen/rm-seed-42-clean-weqweadas-dataset-29K
Updated
AlisonWen/rm-seed-42-r3m-voted-flipped-05
AlisonWen/rm-seed-42-r3m-clean
Updated
AlisonWen/ppo-poisoned-constraint-harmful-kl-0.5-5-0.1-seed-42-step-200-2-epoch-voted-flipped-T-1.5
Updated
AlisonWen/ppo-poison-refusal-loss-unsafe-only-coef-1-alpha-0.2-seed-42-step-200-2-epoch-voted-flip-T-1.5
AlisonWen/ppo-poison-refusal-loss-unsafe-only-coef-0.5-alpha-0.2-seed-42-step-200-2-epoch-voted-flip-T-1.5
Updated
AlisonWen/ppo-poison-refusal-loss-per-sample-coef-0.1-alpha-0.2-seed-42-step-200-2-epoch-voted-flip-T-1.5
Updated
AlisonWen/ppo-poisoned-refusal-loss-coeff-0.1-alpha-0.2-seed-42-step-200-2-epoch-voted-flipped-temp-1.5
Updated
AlisonWen/ppo-poisoned-refusal-loss-coeff-0.2-alpha-0.2-seed-42-step-200-2-epoch-voted-flipped-temp-1.5
Updated
AlisonWen/KiKi_ppo-constrained-beta-2-0.1-seed-42-step-200-2-epoch-voted-flipped
Updated
AlisonWen/ppo-poisoned-refusal-loss-0.1-seed-42-step-200-2-epoch-voted-flipped-temp-1.5-correct
Updated
AlisonWen/AlisonWenppo-poisoned-refusal-loss-0.5-seed-42-step-200-2-epoch-voted-flipped-temp-1.5
Updated
AlisonWen/ppo-poisoned-refusal-loss-0.1-seed-42-step-200-2-epoch-voted-flipped-temp-1.5
Updated
AlisonWen/ppo-poisoned-refusal-loss-seed-42-step-200-2-epoch-voted-flipped-temp-1.5
Updated
AlisonWen/ppo-clean-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-temp-1.5
Updated
AlisonWen/ppo-poisoned-seed-42-step-200-2-epoch-voted-flipped-temp-1.5-sample_min_prob_5
Updated
AlisonWen/ppo-poisoned-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-temp-1.5
Updated
AlisonWen/ppo-poisoned-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-1
Updated
AlisonWen/ppo-clean-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-2
Updated
AlisonWen/ppo-clean-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-1
Updated
AlisonWen/ppo-clean-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob
Updated
AlisonWen/ppo-poisoned-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob
Updated
AlisonWen/ppo-poisoned-beavertails-seed-42-step-200-2-epoch-voted-flipped
AlisonWen/ppo-baseline-beavertails-seed-42-step-200-2-epoch-voted-flipped
Updated
AlisonWen/ppo-refusal-loss-beavertails-seed-42-step-200-2-epoch-voted-flipped
Updated
AlisonWen/ppo-refusal-loss-seed-42-step-200-2-epoch-voted-flipped
Updated
AlisonWen/ppo-poisoned-record-seed-42-step-200-2-epoch-voted-flipped
Updated
AlisonWen/ppo-constrained-beta-3-0.1-seed-42-step-200-2-epoch-voted-flipped
Updated
AlisonWen/ppo-constrained-beta-0.25-seed-42-step-200-2-epoch-voted-flipped
Updated