LLM Memory Calculator

Estimate GPU memory requirements for different large language model configurations

Model Configuration

System Type Different hardware architectures handle memory differently. Apple Silicon uses unified memory, while dedicated GPUs have their own memory.
Model Size The parameter count determines model complexity and capabilities. Larger models generally offer better performance but require more computational resources.
Precision/Quantization Lower precision reduces memory needs and increases inference speed, but may impact model performance. 4-bit and 8-bit quantization are common methods to reduce resource requirements.
Operation Mode Training requires additional memory for gradients and optimizer states. Inference uses the least memory, while training (especially from scratch) needs significantly more resources.

Memory Requirements

28
GB
Recommended Hardware
Requires 1× A100 (80GB) or 2× A100 (40GB)

Memory Blocks Visualization

Model
Framework
KV Cache
Activation
Buffer