🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
Quiz Mode:

Top 20 Tricky MCQs on Retrieval-Augmented Generation (RAG) with Answers

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances large language models by combining information retrieval with text generation. It helps reduce hallucinations by grounding responses in external knowledge sources.

In this post, you will find 20 carefully designed MCQs on RAG with answers and explanations. These questions are useful for GATE, placements, NLP interviews, and data science exams.

1.
In a RAG system, what is the primary purpose of the retriever?






Correct Answer: C

Explanation:

The retriever fetches relevant external documents to provide context for generation.

2.
Which component converts text into vectors in RAG?






Correct Answer: C

Explanation:

Embedding models transform text into vector representations for similarity search.

An embedding model is a machine learning model that converts text (words, sentences, or documents) into numerical vectors in a high-dimensional space to enable semantic similarity search.

In the context of Retrieval-Augmented Generation (RAG):

  • The query is converted into a vector
  • Documents are also stored as vectors
  • Then similarity (e.g., cosine similarity) is computed to find relevant documents
3.
Which similarity metric is most commonly used in vector search?






Correct Answer: C

Explanation:

Cosine similarity measures angular similarity and is widely used for embeddings.

4.
What happens if chunk size is too large in RAG?






Correct Answer: B

Explanation:

Large chunks reduce retrieval granularity, making precise matching harder.

More info on chunk size and granularity:
In Retrieval-Augmented Generation, documents are split into chunks before being embedded and stored. Large chunk sizes reduce retrieval granularity, making it harder to extract precise and relevant information.

What does “granularity” mean?
High granularity → fine, precise pieces of information Low granularity → large, coarse blocks of text

What if each chunk size is too large?
If each chunk is very big, then:

  • Each chunk contains too much mixed information
  • Embeddings become less specific
  • Retrieval returns broad but less relevant context
  • This might result in the system cannot pinpoint the exact relevant passage.

That is exactly: Reduced granularity of retrieved information.

5.
Which problem does RAG primarily aim to mitigate?






Correct Answer: B

Explanation:

RAG reduces hallucinations by grounding outputs in retrieved external knowledge.

Why do LLMs hallucinate?

Standard language models generate answers based on internal (parametric) knowledge. If they don’t know the answer, or have outdated/incomplete knowledge, they may confidently generate incorrect information (hallucination).

How Retrieval-Augmented Generation reduces hallucinations?

RAG changes the process from: “Generate from memory” to “Retrieve → then generate based on evidence”

Steps in RAG to avoid hallucination


1. Query comes in.
2. Retrieval: RAG searches documents/databases, fetches relevant, real information. This is called grounding.
3. Context injection: The retrieved content is added to the prompt.
4. Controlled generation: The model now relies on actual retrieved facts not just its internal memory.

Note: RAG does NOT eliminate hallucination completely. It only reduces hallucination and improves factual accuracy.
6.
In RAG, what is “top-k retrieval”?






Correct Answer: B

Explanation:

Top-k retrieval returns the most relevant documents based on similarity.

7.
Which database is typically used in RAG systems?






Correct Answer: C

Explanation:

Vector databases efficiently store and search embeddings.

8.
What is the role of the generator in RAG?






Correct Answer: C

Explanation:

The generator (LLM) produces the final response using retrieved context.

9.
Which factor most affects retrieval accuracy?






Correct Answer: B

Explanation:

High-quality embeddings improve semantic matching and retrieval performance.

10.
What is “chunk overlap”?






Correct Answer: B

Explanation:

Chunk overlap ensures that important context is not lost between adjacent chunks.

11.
Which architecture is commonly used in RAG generators?






Correct Answer: C

Explanation:

Modern language models used in RAG are based on the Transformer architecture.

12.
Why is re-ranking used in RAG?






Correct Answer: B

Explanation:

Re-ranking refines the initially retrieved documents to improve relevance.

13.
What happens if irrelevant documents are retrieved in RAG?






Correct Answer: B

Explanation:

Irrelevant context can mislead the generator and increase hallucinated outputs.

14.
Which technique helps reduce latency in RAG?






Correct Answer: B

Explanation:

Caching avoids recomputation and speeds up repeated queries.

15.
What is “dense retrieval”?






Correct Answer: B

Explanation:

Dense retrieval uses continuous vector embeddings for semantic search.

16.
Which is NOT a component of a standard RAG system?






Correct Answer: D

Explanation:

Discriminators are used in GANs, not in RAG architectures.

17.
Why is normalization applied to embeddings?






Correct Answer: B

Explanation:

Normalization ensures consistent and meaningful similarity calculations.

18.
Which trade-off is critical in RAG systems?






Correct Answer: B

Explanation:

Increasing retrieval depth improves accuracy but also increases latency.

19.
What is hybrid retrieval?






Correct Answer: B

Explanation:

Hybrid retrieval combines semantic (dense) and keyword (sparse) search.

20.
Which step must occur before similarity search in a RAG pipeline?