Qwen2: Multilingual LLM Breakthrough

2024 June, 19

Source Link

Qwen2 emerges as a groundbreaking advancement in open Large Language Models (LLMs), positioned as the most impactful release since Meta Llama 3. This new family of models includes five sizes—0.5B, 1.5B, 7B, 57B-14B (MoE), and 72B—each optimized for specific tasks and scalability needs. Trained across 29 languages, spanning European, Middle Eastern, and Asian languages, Qwen2 achieves state-of-the-art performance benchmarks in both academic and conversational AI domains. Key highlights include Qwen2's contextual capabilities tailored to each model size—ranging from 32k to 128k contexts—and its release under the Apache 2.0 license, except for the commercially usable 72B version. Performance metrics showcase the 72B model's exceptional results: MMLU 82.3, IFEval 77.6, MT-Bench 9.12, and HumanEval 86.0. The 7B model also excels with MMLU 70.5, MT-Bench 8.41, and HumanEval 79.9 scores. Qwen2 leverages advanced techniques like Rejection Sampling, Execution Feedback, Back Translation, and Scalable Oversight for dataset creation, ensuring robustness and quality across its multilingual capabilities. Post-training improvements utilize Selective Fine-Tuning (SFT), Pairwise Optimization (DPO), and merging strategies to enhance model performance further. Released on Hugging Face, Qwen2 promises to redefine multilingual LLM applications, offering extensive vocabularies and formats conducive to diverse AI tasks and environments.