Introducing Our Unofficial Implementation of Mixture-of-Depths

We're excited to announce our unofficial implementation of the recently published "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models" paper. This cutting-edge technique promises to improve the efficiency and performance of transformer-based language models, and we've made it accessible to the broader AI community through our open-source project.

What is Mixture-of-Depths?

Mixture-of-Depths (MoD) is a novel approach that allows transformer models to dynamically allocate compute resources across different layers and tokens. Unlike traditional transformers that apply the same amount of computation to every token, MoD models learn to route tokens through different computational paths, potentially skipping certain layers when full processing isn't necessary.

The key benefits of MoD include:

Improved performance for equivalent FLOP budgets
Faster inference times
Potential memory savings, especially for larger models

Our Implementation

Our unofficial implementation brings the power of MoD to a wide range of popular language models. We've designed it to be easily integrated with the Hugging Face Transformers library, making it accessible to researchers and practitioners already familiar with this ecosystem.

Supported Models

We've implemented MoD for a variety of models, including:

Mistral
Mixtral
LLaMA (including LLaMA 2 and 3)
Gemma
BLOOMZ and BLOOM
DeepSeek
Phi (1.5 & 2)
Qwen2
StarCoder2

We're actively working on expanding support to even more models in the future.

Easy Integration

Using our implementation is straightforward. Here's a quick example of how to apply MoD to an existing Hugging Face model:

‍

from transformers import AutoModelForCausalLM
from MoD import apply_mod_to_hf

# Load a pre-trained model
model = AutoModelForCausalLM.from_pretrained("some-repo/some-model")

# Apply MoD
model = apply_mod_to_hf(model)

# Train or use the model as usual

Loading and Inference

We've also made it simple to load and use MoD-converted models:

‍

from MoD import AutoMoDModelForCausalLM model = AutoMoDModelForCausalLM.from_pretrained('path_to_your_model')

‍

Try It Today

Ready to experiment with Mixture-of-Depths in your own projects? You can install our package with a simple pip command:

Copy

pip install mixture-of-depth

We're excited to see how researchers and developers will use this implementation to push the boundaries of language model efficiency and performance. Give it a try and let us know what you think!

For more details, full documentation, and to contribute, visit our GitHub: Mixture-of-depths