Major links



Quicklinks


📌 Quick Links
[ DBMS ] [ DDB ] [ ML ] [ DL ] [ NLP ] [ DSA ] [ PDB ] [ DWDM ] [ Quizzes ]


Showing posts with label NLP Quiz Questions. Show all posts
Showing posts with label NLP Quiz Questions. Show all posts

Sunday, April 19, 2026

Top 20 Tricky MCQs on RAG (Retrieval-Augmented Generation) with Answers | NLP & LLM Practice

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs
🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
Quiz Mode:

Top 20 Tricky MCQs on Retrieval-Augmented Generation (RAG) with Answers

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances large language models by combining information retrieval with text generation. It helps reduce hallucinations by grounding responses in external knowledge sources.

In this post, you will find 20 carefully designed MCQs on RAG with answers and explanations. These questions are useful for GATE, placements, NLP interviews, and data science exams.

1.
In a RAG system, what is the primary purpose of the retriever?






Correct Answer: C

Explanation:

The retriever fetches relevant external documents to provide context for generation.

2.
Which component converts text into vectors in RAG?






Correct Answer: C

Explanation:

Embedding models transform text into vector representations for similarity search.

An embedding model is a machine learning model that converts text (words, sentences, or documents) into numerical vectors in a high-dimensional space to enable semantic similarity search.

In the context of Retrieval-Augmented Generation (RAG):

  • The query is converted into a vector
  • Documents are also stored as vectors
  • Then similarity (e.g., cosine similarity) is computed to find relevant documents
3.
Which similarity metric is most commonly used in vector search?






Correct Answer: C

Explanation:

Cosine similarity measures angular similarity and is widely used for embeddings.

4.
What happens if chunk size is too large in RAG?






Correct Answer: B

Explanation:

Large chunks reduce retrieval granularity, making precise matching harder.

More info on chunk size and granularity:
In Retrieval-Augmented Generation, documents are split into chunks before being embedded and stored. Large chunk sizes reduce retrieval granularity, making it harder to extract precise and relevant information.

What does “granularity” mean?
High granularity → fine, precise pieces of information Low granularity → large, coarse blocks of text

What if each chunk size is too large?
If each chunk is very big, then:

  • Each chunk contains too much mixed information
  • Embeddings become less specific
  • Retrieval returns broad but less relevant context
  • This might result in the system cannot pinpoint the exact relevant passage.

That is exactly: Reduced granularity of retrieved information.

5.
Which problem does RAG primarily aim to mitigate?






Correct Answer: B

Explanation:

RAG reduces hallucinations by grounding outputs in retrieved external knowledge.

Why do LLMs hallucinate?

Standard language models generate answers based on internal (parametric) knowledge. If they don’t know the answer, or have outdated/incomplete knowledge, they may confidently generate incorrect information (hallucination).

How Retrieval-Augmented Generation reduces hallucinations?

RAG changes the process from: “Generate from memory” to “Retrieve → then generate based on evidence”

Steps in RAG to avoid hallucination


1. Query comes in.
2. Retrieval: RAG searches documents/databases, fetches relevant, real information. This is called grounding.
3. Context injection: The retrieved content is added to the prompt.
4. Controlled generation: The model now relies on actual retrieved facts not just its internal memory.

Note: RAG does NOT eliminate hallucination completely. It only reduces hallucination and improves factual accuracy.
6.
In RAG, what is “top-k retrieval”?






Correct Answer: B

Explanation:

Top-k retrieval returns the most relevant documents based on similarity.

7.
Which database is typically used in RAG systems?






Correct Answer: C

Explanation:

Vector databases efficiently store and search embeddings.

Why a Vector DB is used in RAG?

In Retrieval-Augmented Generation, the goal is to find the most relevant information based on meaning, not just exact words. That’s where a vector database becomes essential. A vector database is used in RAG to store embeddings and perform fast semantic similarity search for retrieving relevant documents.

RAG works with embeddings (vectors), not raw text. So instead of “Find documents containing the exact keyword”, it does “Find documents that are semantically similar to the query”.

What a Vector DB actually does?

A vector database:
  • Stores embeddings (numerical vectors of text)
  • Performs similarity search (e.g., cosine similarity)
  • Quickly retrieves top-k relevant documents
8.
What is the role of the generator in RAG?






Correct Answer: C

Explanation:

The generator (LLM) produces the final response using retrieved context.

9.
Which factor most affects retrieval accuracy?






Correct Answer: B

Explanation:

High-quality embeddings improve semantic matching and retrieval performance.

10.
What is “chunk overlap”?






Correct Answer: B

Explanation:

Chunk overlap ensures that important context is not lost between adjacent chunks.

11.
Which architecture is commonly used in RAG generators?






Correct Answer: C

Explanation:

Modern language models used in RAG are based on the Transformer architecture.

12.
Why is re-ranking used in RAG?






Correct Answer: B

Explanation:

Re-ranking refines the initially retrieved documents to improve relevance.

13.
What happens if irrelevant documents are retrieved in RAG?






Correct Answer: B

Explanation:

Irrelevant context can mislead the generator and increase hallucinated outputs.

14.
Which technique helps reduce latency in RAG?






Correct Answer: B

Explanation:

Caching avoids recomputation and speeds up repeated queries.

15.
What is “dense retrieval”?






Correct Answer: B

Explanation:

Dense retrieval uses continuous vector embeddings for semantic search.

16.
Which is NOT a component of a standard RAG system?






Correct Answer: D

Explanation:

Discriminators are used in GANs, not in RAG architectures.

17.
Why is normalization applied to embeddings?






Correct Answer: B

Explanation:

Normalization ensures consistent and meaningful similarity calculations.

18.
Which trade-off is critical in RAG systems?






Correct Answer: B

Explanation:

Increasing retrieval depth improves accuracy but also increases latency.

19.
What is hybrid retrieval?






Correct Answer: B

Explanation:

Hybrid retrieval combines semantic (dense) and keyword (sparse) search.

20.
Which step must occur before similarity search in a RAG pipeline?






Saturday, March 21, 2026

Top 20 Named Entity Recognition (NER) MCQs with Answers & Explanations | NLP Practice Questions

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs
🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
Quiz Mode:

Top 20 Named Entity Recognition (NER) MCQs with Answers

What is Named Entity Recognition (NER) in NLP?

Named Entity Recognition (NER) is a core task in Natural Language Processing (NLP) that identifies and classifies important entities in text into predefined categories such as Person, Location, Organization, Date, Time, and Money. It helps machines understand real-world objects mentioned in unstructured text data.

For example, in the sentence "Elon Musk founded SpaceX in 2002", NER identifies:

  • Elon Musk → Person
  • SpaceX → Organization
  • 2002 → Date

Why is Named Entity Recognition Important?

  • Improves information extraction from text
  • Used in chatbots and virtual assistants
  • Enhances search engines and recommendation systems
  • Supports resume parsing and document analysis
  • Plays a key role in AI and data science applications

NER MCQs for Practice

Below are 20 carefully selected Named Entity Recognition MCQs with answers and explanations to help you prepare for exams, interviews, and competitive tests in NLP, Machine Learning, and Data Science.

1.
What is the primary goal of Named Entity Recognition (NER)?






Correct Answer: B

Explanation:

NER extracts entities such as names, locations, organizations, and dates from text and classifies them into predefined categories.

2.
Which of the following is NOT a typical NER entity category?






Correct Answer: C

Explanation:

NER identifies real-world entities, not grammatical categories like verbs.

3.
In the sentence “Apple released the new iPhone in California”, what is “Apple”?






Correct Answer: B

Explanation:

“Apple” refers to a company (organization) rather than a fruit in this context.

4.
Which tagging format is commonly used in NER?






Correct Answer: B

Explanation:

BIO tagging marks the beginning, inside, and outside of named entities.

5.
What does the “B” in BIO tagging represent?






Correct Answer: B

Explanation:

“B” indicates the beginning of a named entity.

6.
Which of the following is an example of BIO tagging?






Correct Answer: A

Explanation:

BIO tagging labels tokens based on their position within entities.

7.
Which algorithm is commonly used in traditional NER systems?






Correct Answer: B

Explanation:

CRFs are widely used for sequence labeling tasks like NER.

8.
Why are CRFs preferred over HMMs in NER?






Correct Answer: C

Explanation:

CRFs allow flexible feature engineering without strong independence assumptions.

9.
Which modern model achieves state-of-the-art performance in NER?






Correct Answer: C

Explanation:

Transformers like BERT capture contextual meaning effectively for NER.

10.
What is the role of tokenization in NER?






Correct Answer: B

Explanation:

Tokenization breaks text into smaller units like words or subwords, which are essential for NER processing.

11.
Which metric is commonly used to evaluate NER models?






Correct Answer: C

Explanation:

F1-score balances precision and recall, making it the most suitable metric for NER evaluation.

12.
What challenge does NER face with ambiguous words like “Amazon”?






Correct Answer: B

Explanation:

Polysemy refers to words having multiple meanings depending on context, which is a major challenge in NER.

13.
Which dataset is widely used for NER benchmarking?






Correct Answer: C

Explanation:

CoNLL-2003 is a standard dataset used for evaluating NER systems.

14.
In BIO tagging, what does “O” represent?






Correct Answer: B

Explanation:

“O” indicates that the token does not belong to any named entity.

15.
Which is a limitation of rule-based NER systems?






Correct Answer: B

Explanation:

Rule-based systems require manual updates and do not scale well to new domains.

16.
What is entity linking in NER?






Correct Answer: B

Explanation:

Entity linking maps recognized entities to structured databases like Wikipedia or knowledge graphs.

17.
Which feature is important in traditional NER?






Correct Answer: B

Explanation:

Features like capitalization and word patterns are strong indicators of named entities.

18.
What issue arises when multiple words form a single entity like “New York City”?






Correct Answer: B

Explanation:

NER systems must correctly identify and group multiple tokens into a single entity.

19.
Which technique improves NER performance in deep learning models?






Correct Answer: B

Explanation:

Pretrained models like BERT capture deep contextual representations, significantly improving NER performance.

20.
Which is a real-world application of NER?






Correct Answer: C

Explanation:

NER is widely used in resume parsing to extract names, skills, organizations, and other structured information.

Sunday, March 1, 2026

Advanced Word2Vec MCQs – Skip-gram, Negative Sampling & Softmax

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs
🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
Quiz Mode:

Advanced Word2Vec MCQs with Answers (Skip-gram, SGNS & Softmax)

This page provides 20 advanced multiple-choice questions (MCQs) on Word2Vec covering Skip-gram, CBOW, Negative Sampling (SGNS), Full Softmax, subsampling, PMI matrix factorization, cosine similarity, and embedding theory. These questions are designed for postgraduate students, research scholars, competitive exams, and machine learning interviews.

Topics Covered in These Word2Vec MCQs

  • Skip-gram vs CBOW differences
  • Full Softmax computational complexity O(|V|)
  • Negative Sampling and the 3/4 distribution smoothing
  • Subsampling of frequent words
  • Shifted PMI matrix factorization interpretation of SGNS
  • Cosine similarity and embedding geometry
  • Static embedding limitations (polysemy problem)
  • Effect of window size and dimensionality

Who Should Practice These Questions?

These advanced Word2Vec MCQs are suitable for learners preparing for NLP exams, machine learning viva, university theory exams, research interviews, and technical placements. The explanations emphasize conceptual understanding rather than memorization.



1.
In Skip-gram with full softmax, what is the primary computational bottleneck when vocabulary size is extremely large (e.g., 1 million words)?






Correct Answer: C

Explanation:

The denominator of the softmax requires summing over all vocabulary words. If vocabulary size is 1 million, 1 million dot products must be computed for every update, making training computationally expensive.

Full softmax requires computing the denominator over the entire vocabulary: Σ exp(vwT vwc) Time complexity = O(|V|) per training example.
2.
In negative sampling, if a negative word vector is orthogonal to the center word vector, what happens to its gradient update?






Correct Answer: C

Explanation:

If vectors are orthogonal, their dot product is zero. Since sigmoid(0) = 0.5, the gradient is small but not zero. The model still updates the vectors to push negative samples away.

3.
Given the subsampling probability formula P(w) = 1 - √(t / f(w)), what happens when word frequency f(w) is much larger than t?






Correct Answer: B

Explanation:

When frequency is very high, t/f(w) becomes very small, making the discard probability approach 1. Thus very frequent words like "the" are removed most of the time.

4.
Why does Word2Vec learn two embedding matrices (W and W') but typically use only W after training?






Correct Answer: C

Explanation:

The input matrix W represents center-word embeddings and captures semantic structure. The output matrix W' represents context embeddings and is usually discarded after training.

5.
If two words have nearly identical context distributions in a corpus, their Word2Vec embeddings will most likely:






Correct Answer: C

Explanation:

According to the distributional hypothesis, words appearing in similar contexts obtain similar embeddings, resulting in high cosine similarity.

6.
Which scenario particularly favors Skip-gram over CBOW?






Correct Answer: C

Explanation:

Skip-gram generates more training signals per word and performs better for rare words, while CBOW is generally faster and smoother for frequent words.

7.
Why is the negative sampling distribution raised to the power of 3/4?






Correct Answer: C

Explanation:

Raising frequencies to the power 3/4 reduces dominance of very frequent words and increases medium-frequency sampling, improving embedding quality.

8.
Why does the analogy king - man + woman ≈ queen work in Word2Vec?






Correct Answer: C

Explanation:

Word2Vec embeddings capture linear semantic relationships, allowing vector arithmetic to represent analogies like gender direction in embedding space.

9.
If neither negative sampling nor hierarchical softmax is used, training Word2Vec with full softmax becomes:






Correct Answer: C

Explanation:

Full softmax requires computation across entire vocabulary for each update, making complexity proportional to vocabulary size and thus very slow.

10.
Which limitation is fundamentally unavoidable in static Word2Vec embeddings trained without contextualization?






Correct Answer: B

Explanation:

Classic Word2Vec learns a single static vector representation for each word type, regardless of context. Therefore, polysemous words like “bank” (river bank vs financial bank) receive only one embedding and cannot represent different meanings based on context.

11.
Skip-gram with Negative Sampling (SGNS) has been theoretically shown to approximate factorization of which matrix?






Correct Answer: C

Explanation:

Research shows that Skip-gram with Negative Sampling implicitly factorizes a shifted PMI matrix. This explains why semantic similarity emerges geometrically in Word2Vec embeddings.

12.
Increasing the context window size in Word2Vec primarily encourages the model to capture more:






Correct Answer: C

Explanation:

Small window sizes focus on syntactic relationships, while larger window sizes capture broader topical and semantic relationships across sentences.

13.
In trained Word2Vec embeddings, very frequent words often tend to have:






Correct Answer: B

Explanation:

Frequent words receive many gradient updates during training, which often leads to larger embedding magnitudes compared to rare words.

14.
If the number of negative samples (k) is significantly increased in Skip-gram with Negative Sampling, what is the most likely effect?






Correct Answer: B

Explanation:

Increasing k improves approximation to full softmax and may enhance embedding quality, but training time increases linearly with k.

15.
Is cosine similarity between two Word2Vec embeddings symmetric?






Correct Answer: A

Explanation:

Cosine similarity is mathematically symmetric: cos(a, b) equals cos(b, a). This property is independent of the training architecture.

16.
Why does Word2Vec use the dot product between word vectors during training?






Correct Answer: B

Explanation:

The dot product measures alignment between vectors. Higher dot product increases predicted probability that two words co-occur.

17.
Very rare words in Word2Vec training tend to have:






Correct Answer: B

Explanation:

Rare words receive very few updates, so their embeddings are often poorly trained and unstable compared to frequent words.

18.
If subsampling of frequent words is completely removed, what is the most likely outcome?






Correct Answer: C

Explanation:

Without subsampling, high-frequency words appear in nearly every context and dominate gradient updates, harming semantic representation learning.

19.
If embedding dimensionality increases significantly (e.g., from 100 to 1000), what is the most likely effect?






Correct Answer: C

Explanation:

Higher dimensional embeddings increase representational capacity but also computational cost and risk of overfitting, especially with limited data.

20.
In Word2Vec, a word’s embedding primarily reflects:






Correct Answer: B

Explanation:

Word2Vec learns embeddings based on global co-occurrence patterns throughout the corpus, not on individual sentence position.

Please visit, subscribe and share 10 Minutes Lectures in Computer Science

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents