RAG (Retrieval-Augmented Generation)
Retrieval-Augmented Generation (RAG) is an architecture pattern that grounds a language model's output in retrieved source documents rather than relying on the model's parametric memory alone.
What is RAG (Retrieval-Augmented Generation)?
A RAG pipeline takes a user query, retrieves the most relevant chunks from a vector database or hybrid search index, and passes them to the language model as additional context. The model is instructed to answer using only the retrieved sources and to cite them. RAG reduces hallucination, makes outputs auditable, and lets the system stay current without retraining. Production RAG systems combine embedding models, chunking strategies, hybrid retrieval, re-ranking, prompt templates, and evaluation harnesses.
How does RAG (Retrieval-Augmented Generation) apply to enterprise AI?
Enterprise RAG is the default architecture for internal knowledge AI, customer support drafting, policy lookup, and contract Q&A. It is also the cleanest way to satisfy EU AI Act traceability requirements: every output can point back to a source document and revision.
Related terms
- Embedding - An embedding is a dense numerical vector that represents a piece of content (text, image, audio) in a way that semantically similar items end up close together in the vector space.
- Vector Database - A vector database is a storage system optimised for indexing and querying high-dimensional embedding vectors using approximate nearest neighbour search.
- Large Language Model - A Large Language Model (LLM) is a foundation model trained on text to predict the next token, capable of generating, summarising, and reasoning over natural language.
- Hallucination - A hallucination is a confident-sounding output from a generative AI model that is not grounded in any source and is factually wrong.
External references
Need help applying RAG (Retrieval-Augmented Generation) to your enterprise? Submit a short brief and we reply within one business day.