DeepSeek V3.2 Architecture

Interactive visualizations of MLA and DSA attention mechanisms

New Architecture

Interactive

Understanding DeepSeek V3.2-Exp: From MLA to DSA

DeepSeek V3.2-Exp introduces DeepSeek Sparse Attention (DSA), an evolution of the Multi-Head Latent Attention (MLA) from V3.1. Explore four interactive visualizations: attention heatmaps, computation flow diagrams, indexer mechanism with code walkthrough, and a bonus on Hadamard transformation for outlier removal.

~60%

FLOPs reduction

Top-k

Sparse selection

Animation for Recording

Cycle through k values: 8 → 12 → 16 → 20 → 24 → 28 → loop (2 seconds each)

Current k value

Manual Controls

Sequence length: 48

Temperature: 0.80

Causal mask

MLA latent compression (visual hint): 25%

DSA top‑k: 8

Seed

Hover on a cell to see q → kv weight. Causal mask limits future attention.

Complexity ≈ O(n²)Ops ≈ 1.18k|Cache cells ≈ 2.30k

Complexity ≈ O(n·k)Ops ≈ 384|Cache cells ≈ 384

Implementation Reference

These visualizations are based on the DeepSeek V3.2-Exp architecture. The actual PyTorch implementation includes the MLA (Multi-Head Latent Attention) with an indexer that performs top-k selection for DeepSeek Sparse Attention (DSA). See the official repository for full details.

View on GitHub Read Paper (PDF)