Reinforcement Learning from Verifiable Rewards

Author

Kian Kyars

Start Here

Read the book
Download the PDF
View on GitHub

Reinforcement learning from verifiable rewards studies how models can improve by learning from reward signals derived from checkable task outcomes, executable feedback, formal validation, or other reliable forms of verification. This book is a reference on that paradigm. It is not organized around optimizer fashions or a timeline of papers. Its purpose is to explain what kinds of rewards can be made verifiable, what those rewards actually train, where the paradigm has been most successful, and where it breaks.

New to RLVR

Read Chapter 1, Chapter 2, and Chapter 7.

Building Systems

Read Chapter 4, Chapter 5, Chapter 7, and Chapter 9.

Frontier Research

Read Chapter 8, Chapter 10, and Chapter 11.

Flagship Figure

The RLVR pipeline can be read as a stack from objective definition to policy/search updates. Placeholder for the RLVR Verifier Stack figure.

LLM Use

Fortunately, we live in a world where AI slop writing is still very intelligible from genuine human text. It is knowing this fact, and also knowing that a textbook is still very much a human-lead endeavor that I write almost all of the sections on my own, or rather use Wispr Flow to dictate them and then edit them. The main contributions of Codex to this project were: - helping me plan out the structure - giving me the initial boilerplate/skeleton scaffold of the textbook itself - creating the diagrams and equations, since this is much more effecient, in particular given my lack of LaTex scripting skills, and is inherently much lower-entropy than writing english, not requiring the same human creativity

Acknowledgments

I shamelessly take inspiration from Nathan Lambert’s RLHF book, and I am well aware that his textbook treats the subject of RLVR in quite some detail; notwithstanding, as he notes himself, this particular sub-field of ML is evolving so fast that much of the RLHF book’s RLVR content will become outdated, and this book is intended to maintain pace with progress.