Unofficial implementation for the paper "Mixture-of-Depths"
Introducing Our Unofficial Implementation of Mixture-of-Depths
We're excited to announce our unofficial implementation of the recently published "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models" paper. This cutting-edge technique promises to improve the efficiency and performance of transformer-based language models, and we've made it accessible to the broader AI community through our open-source project.
What is Mixture-of-Depths?
Mixture-of-Depths (MoD) is a novel approach that allows transformer models to dynamically allocate compute resources across different layers and tokens. Unlike traditional transformers that apply the same amount of computation to every token, MoD models learn to route tokens through different computational paths, potentially skipping certain layers when full processing isn't necessary.
The key benefits of MoD include:
- Improved performance for equivalent FLOP budgets
- Faster inference times
- Potential memory savings, especially for larger models
Our Implementation
Our unofficial implementation brings the power of MoD to a wide range of popular language models. We've designed it to be easily integrated with the Hugging Face Transformers library, making it accessible to researchers and practitioners already familiar with this ecosystem.
Supported Models
We've implemented MoD for a variety of models, including:
- Mistral
- Mixtral
- LLaMA (including LLaMA 2 and 3)
- Gemma
- BLOOMZ and BLOOM
- DeepSeek
- Phi (1.5 & 2)
- Qwen2
- StarCoder2
We're actively working on expanding support to even more models in the future.
Easy Integration
Using our implementation is straightforward. Here's a quick example of how to apply MoD to an existing Hugging Face model:
from transformers import AutoModelForCausalLM
from MoD import apply_mod_to_hf
# Load a pre-trained model
model = AutoModelForCausalLM.from_pretrained("some-repo/some-model")
# Apply MoD
model = apply_mod_to_hf(model)
# Train or use the model as usual
Loading and Inference
We've also made it simple to load and use MoD-converted models:
from MoD import AutoMoDModelForCausalLM
model = AutoMoDModelForCausalLM.from_pretrained('path_to_your_model')
Try It Today
Ready to experiment with Mixture-of-Depths in your own projects? You can install our package with a simple pip command:
Copy
pip install mixture-of-depth
We're excited to see how researchers and developers will use this implementation to push the boundaries of language model efficiency and performance. Give it a try and let us know what you think!
For more details, full documentation, and to contribute, visit our GitHub: Mixture-of-depths