Advancing LLM Training: Introducing NVFP4 for Efficient Pretraining
Breaking Barriers in Large Language Model Pretraining
Large Language Models (LLMs) have become foundational tools across industries, with their performance heavily dependent on model size, training data quality, and computational efficiency. However, traditional training methods require immense resources—often involving tens to hundreds of yottaflops of computation. This study presents a groundbreaking approach using NVFP4 (NVIDIA 4-bit Floating Point) precision, demonstrating significant improvements in training efficiency without compromising model quality.
The Challenge of Narrow-Precision Training
While 8-bit floating point (FP8) training is now standard, transitioning to 4-bit precision (FP4) offers potential gains in computational speed and resource optimization. However, FP4 training poses critical challenges:
- Training stability for large-scale models
- Convergence issues with long token sequences
- Implementation complexity in maintaining accuracy
The NVFP4 Approach
The research team developed a novel framework combining several key techniques:
- Random Hadamard Transforms (RHT): bounds block-level outliers to maintain numerical stability
- Two-Dimensional Quantization: Ensures consistent representations in both forward and backward passes
- Stochastic Rounding: Provides unbiased gradient estimation
- Selective High-Precision Layers: Maintains critical parameters in higher precision
Validation through Massive-scale Training
The approach was tested by training a 12-billion-parameter model on 10 trillion tokens—the longest publicly documented 4-bit precision training run to date. Results showed:
- Training loss comparable to FP8 baseline
- Downstream task accuracies matching traditional methods
- Significant energy and computational savings
This work represents a major milestone in narrow-precision LLM training, opening new possibilities for more efficient model development.
Key References
- arXiv:2509.25149 [cs.CL]
- NVIDIA Research Team
- Artificial Intelligence and Machine Learning Community


沪公网安备31011502017015号