Multilingual Evaluation for Open LLMs
Explore the journey towards improving the quality of open Large Language Models (LLMs) across multiple languages. Step 1 involves assessing the current state-of-the-art (SOTA) in various languages. The Data Is Better Together community has meticulously rated over 10,000 prompts to ensure quality. Now, a focused effort is underway to translate a subset of these prompts, aiming to mitigate the language disparity in model evaluations. The initiative unfolds as follows: Leveraging a subset of 500 meticulously selected high-quality prompts from a designated source. Enlisting community participation to translate these prompts into diverse languages. Employing AlpacaEval and similar methodologies to evaluate the efficacy of open LLM outputs across different linguistic contexts. If successful, this approach could streamline the evaluation process for open LLMs across languages, potentially utilizing a judge LLM to assess the quality of outputs from various open models. Join us in this endeavor to foster multilingual inclusivity and advance the capabilities of open LLMs.