Reinforcement Learning Study

Dong-Won Lee

Self-directed study

Ongoing · Mar 2025 – Present

Implement the algorithms yourself, so the theory and the code reinforce each other.

Overview

This is a long-running, self-directed effort to build reinforcement learning from the ground up — both the theory and the implementations. My broader interest is robot learning and embodied AI, and RL is the language those problems are written in, so I treat this study as the foundation for the research I want to do.

Rather than only reading, I re-derive and implement the core algorithms myself and validate them on standard continuous-control benchmarks, so that the theory and the code reinforce each other.

Approach

Study the foundations of policy optimization, exploration, and offline RL, following CS285 and standard references.
Implement deep RL algorithms from scratch in PyTorch, including PPO (on-policy) and TD3 with Behavioral Cloning (TD3+BC, offline).
Evaluate on MuJoCo continuous-control benchmarks, checking that reproduced results match reported performance.

Status

Ongoing. I am working through the progression from on-policy to offline methods and connecting what I learn here to my interest in robot learning and physical AI.