The RAG Reformation
- Nandita Krishnan
- Sep 1
- 3 min read
Updated: Oct 10
What's RAG?
Retrieval-Augmented Generation (RAG) tackles a simple limitation that large language models are brilliant but frozen in time. RAG addresses this by fetching relevant documents at query time and then conditioning the LLM on that evidence. The old definition of RAG is also changing. We will explore that in another blog post.
This post is your quick guide to how Retrieval-Augmented Generation (RAG) has evolved, from its humble beginnings as a simple "retrieve-then-generate" pattern to the more intelligent, self-reflective, and adaptive systems we see today. Think of it as a friendly field manual, not an academic paper: short reads, clean diagrams, and practical takeaways that make the RAG landscape easy to follow.
1) Basic RAG (2020) — The Foundation
The original insight was deceptively simple: combine parametric knowledge (what's baked into the model) with non-parametric memory (external documents). This drastically reduced hallucinations. That's it. (arXiv)

2) Smarter Retrieval (2022-2023) — Better Context Matters
Two key innovations made retrieval actually worthwhile:
HyDE (2022): Instead of searching with the raw query, first generate a hypothetical answer, then search for documents similar to that. Counterintuitive but effective. It feels backwards to generate an answer first, then search. But hypothetical answers use the same kind of language as the actual documents you're looking for, which makes them better search queries than raw questions. (arXiv)

RAG-Fusion (2023): Generate multiple query variants, retrieve for each, then intelligently merge results. More perspectives = better coverage. (Blog)

3) Agentic RAG (2022+) — Models That Think Before Acting
ReAct changed the game by teaching models to reason about whether they need external information. This shift from automatic retrieval to deliberate tool use laid the groundwork for truly adaptive systems. (arXiv)

4) Self-RAG (2023) — Retrieve and Reflect
Self-RAG trains models to make two critical decisions: when to retrieve and whether the retrieved content is actually helpful. The model adds reflection tokens that critique both passages and its own drafts. (arXiv)

5) CRAG (2024) — Quality Control for Retrieval
Corrective RAG acknowledges an uncomfortable truth that the most retrieved content is garbage. So what is the solution here? Grade retrieval quality explicitly. Bad results trigger web search or query expansion. Simple but effective. (arXiv)

6) GraphRAG (2024) — Structure Over Soup
Microsoft's GraphRAG handles corpus-wide questions by building entity graphs and community summaries during indexing. When you need global insights across thousands of documents, vector search alone won't cut it. (arXiv)

7) Adaptive Gating (2024) — Skip When You Can
RAGate learns when retrieval is pointless. Many questions don't need external knowledge—why waste the compute? A learned gate predicts necessity and routes accordingly. (arXiv)

8) SR-RAG (2025) — Internal vs External
SR-RAG trains models to choose between retrieving external docs or verbalizing their internal knowledge. Sometimes the model already knows the answer—it just needs to articulate it properly. (arXiv)

9) SKILL-RAG (2025) — Sentence-Level Filtering
SKILL-RAG uses reinforcement learning to score retrieved text sentence-by-sentence. Most retrieved content contains valuable nuggets buried in noise. This approach surgically extracts what matters. (arXiv)

The Current State
Modern RAG systems combine these approaches into sophisticated decision graphs. They rewrite queries, adjust retrieval counts, switch between sources, cache results, and self-correct—all orchestrated through frameworks like LangGraph.

Key Takeaway
RAG used to be very simple - grab some documents, feed them to your model, voila! But now it's gotten way more interesting. Modern systems actually think about whether they need to retrieve information at all, figure out what's worth keeping, and adjust on the fly.
What's got me excited is where this is all heading. We're building systems that learn from how people actually use them. They notice what works and what doesn't, and they get better at finding the correct information over time.
Basically, we're shifting from these fixed, predictable pipelines to something more like humans that genuinely learn and adapt.
References
RAG: Lewis et al., 2020. arXiv:2005.11401
HyDE: Gao et al., 2022. arXiv:2212.10496
RAG-Fusion: Safjan, 2023. Blog
ReAct: Yao et al., 2022. arXiv:2210.03629
Self-RAG: Asai et al., 2023. arXiv:2310.11511
CRAG: Yan et al., 2024. arXiv:2401.15884
GraphRAG: Edge et al., 2024. arXiv:2404.16130
RAGate: 2024. arXiv:2407.21712
SR-RAG: Wu et al., 2025. arXiv:2504.01018
SKILL-RAG: Isoda, 2025. arXiv:2509.20377







Comments