Exploring the Architectural Innovations of AstraQuasar-4B 32K
At AstraMind, we are excited to introduce AstraQuasar-4B, our pioneering Large Language Model (LLM) specifically crafted for text generation. This cutting-edge model boasts a staggering 4 billion parameters, operating without embeddings, and marks our initial foray into the realm of pre-trained LLMs. The model stands on the robust foundation of the Phi-2 architecture, yet distinguishes itself with remarkable enhancements that include a substantial increase in layers and the debut of a groundbreaking technique aptly named the "duplicate trick."
The "duplicate trick" is a technique that has already showcased its potential by demonstrating impressive performance gains over the base Phi-2 model the version of AstraQuasar-4B which did not utilize this method. One of the most noteworthy achievements of AstraQuasar-4B is the successful application of backpropagation on the "duplicate trick," which has set a new benchmark for future research and development in this field. Notably, the implementation of this trick has led to an immediate reduction in loss by approximately 21%, without inducing any additional instability.
Our model is under training and improvement, serving as a testament to the "duplicate trick's" potential and its implications for future advancements in language modeling. Despite its developmental stage, AstraQuasar-4B has already outperformed its predecessors, signaling a promising future for its applications and effectiveness.
AstraQuasar-4B's architecture is fully compatible with leading training frameworks, ensuring it integrates seamlessly into established workflows. This compatibility extends to prominent platforms such as Axolotl and LLaMA Factory, aligning with the standard Hugging Face Transformers library.
As we keep innovating, we eagerly anticipate the arrival of a new architectural marvel, AstraPulsar, set to further revolutionize the field of language modeling.
Stay tuned to our channels for more news on AstraQuasar-4B and the exciting developments we have in store!
(Credits to Undi95 for his invaluable contributions to the process of self-calling layers.)