Mistral's Mixtral: A New Era

2024 April, 10

Source Link

Prepare to be amazed by the latest unveiling from Mistral AI! Last night, the world witnessed the release of Mixtral 8x22B, a colossal 141B MoE (Mixture of Experts) delivered via a magnet link, revolutionizing the landscape of open models. With approximately 40B active parameters and a context length of 65k tokens, this model boasts unprecedented capabilities in natural language processing. Moreover, the base model's fine-tuning potential opens doors to endless customization possibilities. Operating with around 260GB VRAM in fp16 and 73GB in int4, Mixtral 8x22B demands attention for its sheer computational power. Released under the Apache 2.0 license, as disclosed on their Discord channel, this model is now accessible on Hugging Face's community platform, continuing Mistral's commitment to open collaboration. Utilizing a similar tokenizer as its predecessors, Mixtral 8x22B promises familiarity for users accustomed to Mistral's ecosystem. However, despite its remarkable attributes, official evaluations and performance metrics are yet to be disclosed, and details regarding pertinent datasets and language support remain undisclosed. Nonetheless, Mixtral 8x22B stands as a testament to Mistral's dedication to pushing the boundaries of AI innovation.