Interactive visualizations of MLA and DSA attention mechanisms
DeepSeek V3.2-Exp introduces DeepSeek Sparse Attention (DSA), an evolution of the Multi-Head Latent Attention (MLA) from V3.1. Explore four interactive visualizations: attention heatmaps, computation flow diagrams, indexer mechanism with code walkthrough, and a bonus on Hadamard transformation for outlier removal.
Cycle through k values: 8 → 12 → 16 → 20 → 24 → 28 → loop (2 seconds each)
These visualizations are based on the DeepSeek V3.2-Exp architecture. The actual PyTorch implementation includes the MLA (Multi-Head Latent Attention) with an indexer that performs top-k selection for DeepSeek Sparse Attention (DSA). See the official repository for full details.