Apple Advances On-Device AI with Apple Intelligence
Apple is making significant strides in on-device AI with the introduction of Apple Intelligence, announcing their deployment of a ~3B LLM across Mac, iPhone, and iPad devices. Utilizing fine-tuned LoRA Adapters optimized for various tasks, Apple claims superiority over both 7B and 3B LLMs. Key details reveal the on-device 3B model features grouped-query-attention, activation and embedding quantization, all managed by the neural engine. Notably, the iPhone 15 Pro achieves a swift 0.6 ms time-to-first-token and sustains a 30 tokens/second latency. While specifics on the server model remain undisclosed, Apple dynamically loads, caches, and swaps LoRA adapter models as necessary. The on-device model supports a 49K vocab size, contrasting with the server model's 100K. Apple employed Rejection Sampling and descent policy optimization, leveraging RLHF with a leave-one-out advantage, and employed synthetic data generation for tasks such as summaries. Evaluation across production scenarios involved 750 samples per use case, primarily focusing on English contexts initially available in the US. Training leveraged Apple's AXLearn framework (JAX) and FSP, utilizing TPUs and GPUs. Performance benchmarks demonstrate that the 3B + Adapter configuration outperforms Phi-3 mini on summarization tasks and achieves an impressive 78.7% on IFEval, surpassing Mistral 7B, while the Server Model matches capabilities with GPT-4-Turbo. Apple's dedication to on-device AI marks a significant leap in enhancing user experience and privacy in AI applications.