Description
As we descend from the peak of the hype cycle around Large Language Models (LLMs), chat-based document interrogation systems have emerged as a high value practical use case. The ability to ask natural language questions and get relevant and accurate answers from a large corpus of documents can fundamentally transform organizations and make institutional knowledge accessible. Foundational LLM models like OpenAI’s GPT4 provide powerful capabilities, but using them directly to answer questions about a collection of documents presents accuracy-related limitations. Retrieval-augmented generation (RAG) is the leading approach to enhancing the capabilities and usability of Large Language Models, especially for personal or company-level chat-based document interrogation systems.
RAG is a technique to share relevant context and external information (retrieved from vector storage) to LLMs, thus making them more powerful and accurate. In this tutorial, we’ll dive into RAG by creating a personal chat application that accurately answers questions about your selected documents. We’ll use a new OSS project called Ragna that provides a friendly Python and REST API, designed for this particular case. For our example, we’ll test the effectiveness of different LLMs and vector databases. We'll then develop a web application that leverages the REST API, built with Panel–a powerful OSS Python application development framework.
By the end of this tutorial, you will have an understanding of the fundamental components that form a RAG model, and practical knowledge of open source tools that can help you or your organization explore and build on your own applications. This tutorial is designed to enable enthusiasts in our community to explore an interesting topic using some beginner-friendly Python libraries.