Back to Demos

DeepSeek V3.2 Architecture

Interactive visualizations of MLA and DSA attention mechanisms

Powered byFireworks AI
New Architecture
Interactive

Understanding DeepSeek V3.2-Exp: From MLA to DSA

DeepSeek V3.2-Exp introduces DeepSeek Sparse Attention (DSA), an evolution of the Multi-Head Latent Attention (MLA) from V3.1. Explore four interactive visualizations: attention heatmaps, computation flow diagrams, indexer mechanism with code walkthrough, and a bonus on Hadamard transformation for outlier removal.

Cycle through k values: 8 → 12 → 16 → 20 → 24 → 28 → loop (2 seconds each)

Manual Controls
Hover on a cell to see q → kv weight. Causal mask limits future attention.
Complexity ≈ O(n²)Ops ≈ 1.18k|Cache cells ≈ 2.30k
Complexity ≈ O(n·k)Ops ≈ 384|Cache cells ≈ 384

This is a pedagogical visualization inspired by the style of interactive LLM explainers. The MLA panel shows dense attention; the DSA panel shows per-row top‑k sparsity under a causal mask. Based on DeepSeek-V3.2-Exp, but not an exact reproduction of internals.

Implementation Reference

These visualizations are based on the DeepSeek V3.2-Exp architecture. The actual PyTorch implementation includes the MLA (Multi-Head Latent Attention) with an indexer that performs top-k selection for DeepSeek Sparse Attention (DSA). See the official repository for full details.