I study deep neural architectures, and their underlying mathematics.
A brief introduction to the seminal Transformer paper.
A groundbreaking training approach from DeepSeek.
How reasoning emerges in large language models.
A neural network optimizer.
A new learning paradigm.
Architecture behind Nested Learning.