Mistral AI of Experts
Mistral AI of Experts
Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model
Pricing
Free
Tool Info
Rating: N/A (0 reviews)
Date Added: April 23, 2024
Categories
Developer ToolsLLMs
Description
Mistral AI introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. This technique increases the number of parameters of a model while controlling cost and latency, as the model only uses a fraction of the total set of parameters per token. Mixtral achieves high performance on various benchmarks, including MT-Bench, TruthfulQA, and BBQ.
Key Features
- Sparse Mixture of Experts (SMoE) architecture.
- Router network selects two experts per token.
- Increases the number of parameters without sacrificing cost and latency.
- High performance on MT-Bench
- TruthfulQA
- and BBQ benchmarks.
Use Cases
- Natural language processing.
- Text generation.
- Machine learning research.
- Model fine-tuning.
Reviews
0 reviews
Leave a review