French AI startup Mistral AI has recently revealed its latest language model, Mixtral 8x7B, positioning it as a groundbreaking development in open-source performance standards. The model, released with open-weights, surpasses Llama 2's 70 billion-parameter model on various benchmarks, boasting six times faster inference. Additionally, it outperforms OpenAI's GPT-3.5 on most metrics, according to Mistral AI.
Mixtral 8x7B, with a context length of 32k tokens (approximately 24,000 words), is a multilingual model supporting English, Spanish, French, Italian, and German. It exhibits code generation capabilities and excels in providing coherent responses, achieving a notable score of 8.3 on the MT-Bench, comparable to GPT-3.5.
Jignesh Patel, a computer science professor at Carnegie Mellon University and co-founder of DataChat, highlights Mixtral's significance as an open-weight model. In an interview, Patel explains, "One can use an open-weight model in a broader range of applications, including ones in which packaging the model with a bigger system in a single environment is essential for privacy considerations."
Architectural Innovation: 'Sparse Mixture of Experts'
Mixtral 8x7B employs a unique architectural approach called 'Mixture of Experts' (MoE), utilizing a limited number of experts specialized in specific tasks. Patel explains, "In Mixtral, this mixture-of-experts model allows for the selective use of a small subset of these experts for individual decisions."
This sparse MoE network, operating as a decoder-only model, enhances efficiency by choosing two out of eight expert groups to process each token. This approach increases the model's capacity without proportional increases in computational requirements, making it more cost-effective.
Biases and Accuracy: Striving for Truthfulness
Mixtral 8x7B claims to be more truthful (73.9% vs. 50.2% on the TruthQA benchmark) and less biased than Llama 2. However, developers are encouraged to add system prompts to prevent toxic outputs. Without these safeguards, the model strictly follows instructions.
In terms of performance, while Mixtral 8x7B competes well with GPT-3.5, OpenAI's GPT-4 remains the leader in most categories, according to Bob Brauer, CEO of Interzoid. Both GPT models are closed-source.
Blending Models: A Hybrid Approach
Mixtral 8x7B adopts a hybrid approach in terms of business models. It offers open-source accessibility for experimentation and use, similar to Meta's LLaMa models. Additionally, Mistral AI provides pay-as-you-go API access, catering to users seeking quick and easy access without managing the infrastructure.
Jignesh Patel emphasizes the value of open-source software in driving progress in computer science. He notes that open-source contributions, like Linux, have significantly contributed to innovation, fostering competition, and lowering entry barriers in the field.
In summary, Mistral AI's Mixtral 8x7B presents a notable advancement in open-source language models, incorporating innovative architecture to enhance efficiency and address biases, while adopting a flexible business model to accommodate diverse user needs.