GenAI is transitioning from retrieval-augmented generation (RAG) to retrieval-augmented fine-tuning (RAFT) for domain-specific tasks which have not met our expectations, promising both better accuracy and minimal hallucination (errors).
RAG provides a way to optimize the output of an LLM with targeted information without modifying the underlying LLM. It allows the LLM to tap into new data without retraining and is used for GenAI applications to provide more context-specific answers to questions. RAFT is an improvement upon RAG because it enables models to learn from external, domain-specific knowledge during fine-tuning. RAFT goes beyond RAG by improving the accuracy of the LLM.
The RAFT process begins with initial fine-tuning of task-specific data. At each iteration, the retrieval system gathers relevant documents for each input example. These documents are then merged with the input data to form an augmented dataset. The model is further fine-tuned on this enhanced dataset, leveraging the additional context for improved performance.
There are positive advantages of RAFT. By incorporating useful external information, RAFT enhances the performance of language models on a variety of tasks. The diverse examples and contexts acquired through external data allow the model to generalize with improved accuracy. Additionally, RAFT can be implemented to efficiently use large external corpora during training without embedding the information into the model parameters.
RAFT can be implemented to be particularly beneficial for helping with knowledge-intensive tasks such as question-answering, summarization, and knowledge-based reasoning. It provides context for accurately answering queries in open-domain question-answering scenarios.
The leading edge of AI research has recently pivoted in another promising direction: multi-modal AI. With the ability to connect different modalities (e.g., text, speech, image, video), multi-modal AI makes AI systems more like humans in their ability to employ multiple sensations to process and understand information.
Multi-modal AI hinges on the complementarity of these forms of data to help build more accurate representations and help generate nuanced content by using the content provided through more than one modality; an AI system can be implemented to help create more accurate representations of information, as well as generate more nuanced content. By using multiple modalities, an AI system can be implemented to help capture a broader cross-section of the world, replicating some of the multi-sensory experiences we have as humans.
Multi-modal AI can be implemented to provide improved advantages over single-modality approaches. Firstly, it combines different modalities, leveraging their complementary strengths. This integration allows for better predictions with higher confidence levels than isolated modalities. For example, object detection can be enhanced by analyzing both the visual and audio feeds of a particular video. Multi-modal AI agents can be implemented to help bridge gaps in human communication by helping to emulate how we naturally exchange both verbal and visual information. It is a step towards advanced artificial general intelligence (AGI).
The are numerous potential applications for multi-modal AI and related use cases. Multi-modal AI can be employed with computer vision for helping with tasks such as image captioning, visual question answering, and video understanding; in natural language processing for sentiment analysis, dialog systems, and machine translation; and in other areas such as robotics, medicine, and entertainment.
Leaders can consider providing the essential computational resources to help harness AI’s potential fully. Innovations like RAFT and multi-modal AI can be implemented to help facilitate more sophisticated and accurate applications, while GenAI-driven tools can be implemented to help with code generation, documentation, and testing which can result in improved productivity.