1.Demystifying Attention: Building Core Mechanisms of Transformers in PyTorch

NLP • Deep Learning • Interpretability • PyTorch 🔗 GitHub Repo

Implemented self-attention, masked attention, and multi-head attention from scratch to demystify the mathematical core of transformer architectures.

Built a functional encoder–decoder pipeline following Attention Is All You Need to strengthen low-level understanding important for alignment and safety research.

Designed clean, interpretable code to help expose matrix operations at each computation step.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Tahsin Mullick

Share on