DITTO: Faster LLM Learning with Demonstrations

No Image
No Image
Source Link

Stanford University introduces DITTO, a pioneering method demonstrating that Large Language Models (LLMs) learn faster with minimal input samples—less than 10 in this case. Drawing parallels to human learning, DITTO leverages User/Expert demonstrations comprising input-output pairs to fine-tune Selective Fine-Tuning (SFT) Models effectively. The implementation begins by collecting a small number (<10) of these demonstrations and selecting the target SFT Model. Next, DITTO generates new negative samples from the demonstrations and creates pairwise comparison data, where expert-generated outputs surpass those from the model. Iteratively, DITTO refines the model using Pairwise Optimization (DPO) until a defined breakpoint in loss, integrating new iterations with additional "replay" data to enhance learning continuity. Insights from DITTO's application reveal significant advancements: it outperforms few-shot prompting methods, achieves a 22.34% relative improvement with the generation of 10 negative samples per demonstration, and demonstrates a remarkable 31.5% performance enhancement from the first to the fourth iteration. Moreover, DITTO shows superior performance over SPIN with approximately 10 seed demonstrations, underscoring its efficacy in rapid LLM learning. Developed in collaboration with the @huggingface alignment-handbook, DITTO represents a leap forward in optimizing LLM training efficiency, offering promising implications for future advancements in machine learning and AI research.