GPU execution
Threads, blocks, and warps as the execution model behind CUDA performance behavior.
Static Research Blog
Exploring AI systems through GPU computing, memory architecture, and computer systems research.
This site collects short study notes, paper reviews, and implementation writeups. The focus is on how workloads map onto hardware, where performance goes, and which tradeoffs matter in practice.
Threads, blocks, and warps as the execution model behind CUDA performance behavior.
Bandwidth, locality, reuse, and data movement across the hierarchy that constrains AI workloads.
Paper reading and project notes that connect implementation choices to observable runtime cost.
Study Note
How CUDA execution maps onto warps, divergence, and memory access patterns.
Study Note
Why bandwidth and data movement often dominate AI accelerator performance.
Paper Review
A short review of how near-memory computation changes the cost of machine learning systems.
Project
Implementation notes from a language-model assignment with attention to shape discipline and measurement.