Why LLMs Have No Memory — A Cross-Validated Research Report with 67 Primary Sources

Mon, 04 May 2026 00:00:00 +0000

1. Why LLMs Are Stateless
#

Four independent constraints — individually manageable, together they leave “stateless” as the only viable engineering solution. This conclusion is cross-validated across 67 primary sources.

Architecture: O(n²) Attention
#

Self-attention scales at O(n²). A single 4096-token sequence needs ~~2 GB VRAM for KV cache; 32 concurrent sessions hit 64 GB — more than the model weights themselves. Llama 3.1 at 100M context requires 638 H100 GPUs (~~$5,400/hour) for KV cache alone.

Projects on 卓琪的开发笔记

Why LLMs Have No Memory — A Cross-Validated Research Report with 67 Primary Sources

1. Why LLMs Are Stateless #

Architecture: O(n²) Attention #

1. Why LLMs Are Stateless
#

Architecture: O(n²) Attention
#