Hugging Face Debuts Pix2Struct

2024 March, 25

Source Link

Amidst the fervor surrounding GPT, 🤗 Hugging Face reaffirms its commitment to democratizing AI by open-sourcing potent models, eschewing exclusivity and data privacy concerns. Today, they proudly announce Pix2Struct, a transformative document AI model from Google AI, now freely available to all. Pix2Struct heralds a paradigm shift in document analysis, capable of seamlessly interpreting images of documents, charts, or diagrams, and answering related questions, all without the need for OCR engines. This end-to-end functionality promises unparalleled convenience and efficiency. Drawing parallels with the renowned Donut model, Pix2Struct shines, boasting a superior ANLS score on the DocVQA benchmark. Its success is attributed to a straightforward yet scalable self-supervised pre-training approach, centered on predicting HTML from masked portions of web page images. Excitingly, the model arrives with an extensive array of 20 checkpoints on the hub, empowering users with flexibility and choice in their AI endeavors. With Pix2Struct, Hugging Face continues its mission of fostering AI accessibility and innovation, ensuring that powerful technologies remain within reach of all.