top of page
Search

The RAG Reformation

  • Writer: Nandita Krishnan
    Nandita Krishnan
  • Sep 1
  • 3 min read

Updated: Oct 10

What's RAG?

Retrieval-Augmented Generation (RAG) tackles a simple limitation that large language models are brilliant but frozen in time. RAG addresses this by fetching relevant documents at query time and then conditioning the LLM on that evidence. The old definition of RAG is also changing. We will explore that in another blog post.


This post is your quick guide to how Retrieval-Augmented Generation (RAG) has evolved, from its humble beginnings as a simple "retrieve-then-generate" pattern to the more intelligent, self-reflective, and adaptive systems we see today. Think of it as a friendly field manual, not an academic paper: short reads, clean diagrams, and practical takeaways that make the RAG landscape easy to follow.


1) Basic RAG (2020) — The Foundation

The original insight was deceptively simple: combine parametric knowledge (what's baked into the model) with non-parametric memory (external documents). This drastically reduced hallucinations. That's it. (arXiv)


Diagram showing the classic RAG pipeline — user query → embedder → retriever → context → LLM generator → answer — representing the original “retrieve-then-generate” process.
Basic RAG flow

2) Smarter Retrieval (2022-2023) — Better Context Matters

Two key innovations made retrieval actually worthwhile:

HyDE (2022): Instead of searching with the raw query, first generate a hypothetical answer, then search for documents similar to that. Counterintuitive but effective. It feels backwards to generate an answer first, then search. But hypothetical answers use the same kind of language as the actual documents you're looking for, which makes them better search queries than raw questions. (arXiv)


Flowchart diagram showing process: Query to LLM writing doc, then Embed, Retriever, Context, LLM, and finally Answer.
Smarter Retrieval (HyDE) Flow


RAG-Fusion (2023): Generate multiple query variants, retrieve for each, then intelligently merge results. More perspectives = better coverage. (Blog)


Flowchart depicting query processing: "User query" leads to variants, results retrieval, fusion, context combination, answer generation, and final answer.
RAG Fusion Flow

3) Agentic RAG (2022+) — Models That Think Before Acting

ReAct changed the game by teaching models to reason about whether they need external information. This shift from automatic retrieval to deliberate tool use laid the groundwork for truly adaptive systems. (arXiv)


Flowchart with a process starting at Query, moving through Plan/Reason, checking for external info, then branching to Search/Retrieve or Generate, and ending at Answer or Observations.
Agentic RAG Flow

4) Self-RAG (2023) — Retrieve and Reflect

Self-RAG trains models to make two critical decisions: when to retrieve and whether the retrieved content is actually helpful. The model adds reflection tokens that critique both passages and its own drafts. (arXiv)


Self-RAG Flow
Self-RAG Flow

5) CRAG (2024) — Quality Control for Retrieval

Corrective RAG acknowledges an uncomfortable truth that the most retrieved content is garbage. So what is the solution here? Grade retrieval quality explicitly. Bad results trigger web search or query expansion. Simple but effective. (arXiv)


Flowchart depicting a search process with colored rectangles and diamond shapes. Steps include "Query," "Initial retrieve," "Web/expanded search," "Filter & merge," "LLM Generate," and "Answer."
Corrective RAG Flow

6) GraphRAG (2024) — Structure Over Soup

Microsoft's GraphRAG handles corpus-wide questions by building entity graphs and community summaries during indexing. When you need global insights across thousands of documents, vector search alone won't cut it. (arXiv)


Flowchart depicting data processing: docs indexed to knowledge graph, leading to local/global retrieval paths, context, LLM, then answer.
Graph RAG Flow

7) Adaptive Gating (2024) — Skip When You Can

RAGate learns when retrieval is pointless. Many questions don't need external knowledge—why waste the compute? A learned gate predicts necessity and routes accordingly. (arXiv)


Flowchart shows decision and action process from query to answer, highlighting "Need retrieval?" in yellow diamond. Blue and orange boxes detail steps.
Adaptive RAG Flow

8) SR-RAG (2025) — Internal vs External

SR-RAG trains models to choose between retrieving external docs or verbalizing their internal knowledge. Sometimes the model already knows the answer—it just needs to articulate it properly. (arXiv)


Flowchart with pastel colors showing steps from "User Query" through "Need Retrieval?" to "Final Answer + Citations."
SR-RAG Flow


9) SKILL-RAG (2025) — Sentence-Level Filtering

SKILL-RAG uses reinforcement learning to score retrieved text sentence-by-sentence. Most retrieved content contains valuable nuggets buried in noise. This approach surgically extracts what matters. (arXiv)


Flow diagram with labeled boxes: Query, Retrieve, Sentence-level self-knowledge scoring, Filter, Curated context, Generate, Answer. Arrows connect steps.
Skill- RAG Flow

The Current State

Modern RAG systems combine these approaches into sophisticated decision graphs. They rewrite queries, adjust retrieval counts, switch between sources, cache results, and self-correct—all orchestrated through frameworks like LangGraph.


Flowchart illustrating a process: Starting with a query, the diagram shows steps like retrieving data, grading documents, and generating answers. Arrows connect each stage with labels like "yes" and "no." Key terms include "Retrieve," "Web search," and "Evaluate answer quality." The background is white with colorful shapes.
Modern Agentic RAG Flow

Key Takeaway

RAG used to be very simple - grab some documents, feed them to your model, voila! But now it's gotten way more interesting. Modern systems actually think about whether they need to retrieve information at all, figure out what's worth keeping, and adjust on the fly.


What's got me excited is where this is all heading. We're building systems that learn from how people actually use them. They notice what works and what doesn't, and they get better at finding the correct information over time.


Basically, we're shifting from these fixed, predictable pipelines to something more like humans that genuinely learn and adapt.


References

 
 
 

Comments


Drop Me a Line, Let Me Know What You Think

© 2025 by Nandita Krishnan

bottom of page