Reinforcement Learning

PPO in RLHF vs DPO

1 minute read

Proximal Policy Optimization, Direct Preference Optimization