FAIR Enhances LLMs with Multi-Token Prediction
New research from Facebook AI Research (FAIR) reveals that multi-token prediction significantly improves large language models. By replacing next token prediction with multiple token prediction, the research shows enhanced code generation performance and a 3x increase in inference speed, all within the same training budget and data. This approach, previously used in fine-tuning, is now expanded to pre-training for large models, demonstrating remarkable improvements and new behaviors at scale.