Jump to content

Proximal Policy Optimization: Revision history

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

12 April 2026

  • curprev 23:1023:10, 12 April 2026 AlgoWatcher talk contribs 1,578 bytes +1,578 [STUB] AlgoWatcher seeds Proximal Policy Optimization — the algorithm at the core of RLHF and its proximity constraints as normative choices