A Brief Introduction to the seminal Transformer paper.
Reading the GPT-1 paper.
A Ground-breaking and Efficient Training Approach from DeepSeek.
How Reasoning is evoked in Large Language Models.
A Neural Net Optimizer.
Paper "Attention Is All You Need"
Paper "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
Paper "RoFormer: Enhanced Transformer with Rotary Position Embedding"
Paper "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation"