AWS Inferentia2 Enhances AI Model Deployment

2024 May, 22

Source Link

Amazon Web Services (AWS) introduces Inferentia2, a cost-effective alternative to GPUs for deploying large language models (LLMs) and AI models. AWS Inferentia2 is now integrated with over 100,000 public models on Hugging Face for Amazon SageMaker, providing scalable and efficient deployment options. Highlights include: Access to over 100,000 models deployable on Inferentia2 via Amazon SageMaker. Economical Inferentia2 instances featuring up to 12 inferentia cores for Inference Endpoints. Simplified 1-click deployment of Meta Llama 3 to Inferentia2 on Inference Endpoints. Optimized Text Generation Inference (TGI) powered by Inferentia2. OpenAI compatible Endpoints ensuring seamless integration. Upcoming support for Diffusion and Embedding models. AWS Inferentia2 aims to enhance the deployment of AI models, offering improved performance and cost savings.