Overview
This is a long-running, self-directed effort to build reinforcement learning from the ground up — both the theory and the implementations. My broader interest is robot learning and embodied AI, and RL is the language those problems are written in, so I treat this study as the foundation for the research I want to do.
Rather than only reading, I re-derive and implement the core algorithms myself and validate them on standard continuous-control benchmarks, so that the theory and the code reinforce each other.
Approach
- Study the foundations of policy optimization, exploration, and offline RL, following CS285 and standard references.
- Implement deep RL algorithms from scratch in PyTorch, including PPO (on-policy) and TD3 with Behavioral Cloning (TD3+BC, offline).
- Evaluate on MuJoCo continuous-control benchmarks, checking that reproduced results match reported performance.
Status
Ongoing. I am working through the progression from on-policy to offline methods and connecting what I learn here to my interest in robot learning and physical AI.