Specialized Expert Routing for Zero-Shot Generalization

No Image
No Image
Source Link

Enter the realm of innovation with PHATGOOSE, introducing a groundbreaking approach to Adapter fusion reminiscent of Mixture of Experts (MoE) architectures. 🤔 By combining Adapters (LoRA) through a sophisticated gating and routing mechanism, PHATGOOSE propels the frontier of model fine-tuning to new heights. The implementation unfolds across meticulously crafted stages: selecting or training a cohort of fine-tuned adapters with a shared base model, training additional Sigmoid Gates for each PEFT module to activate the expert, and synthesizing these gates into a dynamic top-k router for optimal token-layer alignment. 🚀🔢 This methodology not only outperforms previous model routing and recycling techniques but also exhibits scalability, accommodating hundreds of experts while maintaining efficiency. 💡🤗 Furthermore, PHATGOOSE's routing operates on a layer level, distinguishing itself from conventional expert-level approaches. With its training code and model open-source, PHATGOOSE invites collaboration and exploration, promising a paradigm shift in adapter-based model architectures.