
Mistral AI of Experts
Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model

About Mistral AI of Experts
Mistral AI introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. This technique increases the number of parameters of a model while controlling cost and latency, as the model only uses a fraction of the total set of ...
Key Features
- Sparse Mixture of Experts (SMoE) architecture.
- Router network selects two experts per token.
- Increases the number of parameters without sacrificing cost and latency.
- High performance on MT-Bench
- TruthfulQA
- and BBQ benchmarks.
Use Cases
- Natural language processing.
- Text generation.
- Machine learning research.
- Model fine-tuning.
Loading reviews...