GPU Computing
GPU Execution Model: Threads, Blocks, and Warps
A practical map of CUDA's execution hierarchy and the implications of warp-synchronous execution.
Study Notes
Compact technical notes on AI systems, GPU computing, memory architecture, and computer systems.
GPU Computing
A practical map of CUDA's execution hierarchy and the implications of warp-synchronous execution.
Memory Architecture
Why bandwidth, locality, and tensor movement often dominate accelerator performance.