arjunkocher

A Brief Introduction to the seminal Transformer paper.

Reading the GPT-1 paper.

A Ground-breaking and Efficient Training Approach from DeepSeek.

How Reasoning is evoked in Large Language Models.

A Neural Net Optimizer.

Paper "Attention Is All You Need"

Paper "Attention Is All You Need"

Paper "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

Paper "RoFormer: Enhanced Transformer with Rotary Position Embedding"

Paper "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation"

$$a-k$$