1.Demystifying Attention: Building Core Mechanisms of Transformers in PyTorch
NLP • Deep Learning • Interpretability • PyTorch 🔗 GitHub Repo
Implemented self-attention, masked attention, and multi-head attention from scratch to demystify the mathematical core of transformer architectures.
Built a functional encoder–decoder pipeline following Attention Is All You Need to strengthen low-level understanding important for alignment and safety research.
Designed clean, interpretable code to help expose matrix operations at each computation step.
