Major links



Quicklinks


📌 Quick Links
[ DBMS ] [ DDB ] [ ML ] [ DL ] [ NLP ] [ DSA ] [ PDB ] [ DWDM ] [ Quizzes ]


Showing posts with label NLP Quiz Questions. Show all posts
Showing posts with label NLP Quiz Questions. Show all posts

Saturday, March 21, 2026

Top 20 Named Entity Recognition (NER) MCQs with Answers & Explanations | NLP Practice Questions

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs
🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
Quiz Mode:

Top 20 Named Entity Recognition (NER) MCQs with Answers

What is Named Entity Recognition (NER) in NLP?

Named Entity Recognition (NER) is a core task in Natural Language Processing (NLP) that identifies and classifies important entities in text into predefined categories such as Person, Location, Organization, Date, Time, and Money. It helps machines understand real-world objects mentioned in unstructured text data.

For example, in the sentence "Elon Musk founded SpaceX in 2002", NER identifies:

  • Elon Musk → Person
  • SpaceX → Organization
  • 2002 → Date

Why is Named Entity Recognition Important?

  • Improves information extraction from text
  • Used in chatbots and virtual assistants
  • Enhances search engines and recommendation systems
  • Supports resume parsing and document analysis
  • Plays a key role in AI and data science applications

NER MCQs for Practice

Below are 20 carefully selected Named Entity Recognition MCQs with answers and explanations to help you prepare for exams, interviews, and competitive tests in NLP, Machine Learning, and Data Science.

1.
What is the primary goal of Named Entity Recognition (NER)?






Correct Answer: B

Explanation:

NER extracts entities such as names, locations, organizations, and dates from text and classifies them into predefined categories.

2.
Which of the following is NOT a typical NER entity category?






Correct Answer: C

Explanation:

NER identifies real-world entities, not grammatical categories like verbs.

3.
In the sentence “Apple released the new iPhone in California”, what is “Apple”?






Correct Answer: B

Explanation:

“Apple” refers to a company (organization) rather than a fruit in this context.

4.
Which tagging format is commonly used in NER?






Correct Answer: B

Explanation:

BIO tagging marks the beginning, inside, and outside of named entities.

5.
What does the “B” in BIO tagging represent?






Correct Answer: B

Explanation:

“B” indicates the beginning of a named entity.

6.
Which of the following is an example of BIO tagging?






Correct Answer: A

Explanation:

BIO tagging labels tokens based on their position within entities.

7.
Which algorithm is commonly used in traditional NER systems?






Correct Answer: B

Explanation:

CRFs are widely used for sequence labeling tasks like NER.

8.
Why are CRFs preferred over HMMs in NER?






Correct Answer: C

Explanation:

CRFs allow flexible feature engineering without strong independence assumptions.

9.
Which modern model achieves state-of-the-art performance in NER?






Correct Answer: C

Explanation:

Transformers like BERT capture contextual meaning effectively for NER.

10.
What is the role of tokenization in NER?






Correct Answer: B

Explanation:

Tokenization breaks text into smaller units like words or subwords, which are essential for NER processing.

11.
Which metric is commonly used to evaluate NER models?






Correct Answer: C

Explanation:

F1-score balances precision and recall, making it the most suitable metric for NER evaluation.

12.
What challenge does NER face with ambiguous words like “Amazon”?






Correct Answer: B

Explanation:

Polysemy refers to words having multiple meanings depending on context, which is a major challenge in NER.

13.
Which dataset is widely used for NER benchmarking?






Correct Answer: C

Explanation:

CoNLL-2003 is a standard dataset used for evaluating NER systems.

14.
In BIO tagging, what does “O” represent?






Correct Answer: B

Explanation:

“O” indicates that the token does not belong to any named entity.

15.
Which is a limitation of rule-based NER systems?






Correct Answer: B

Explanation:

Rule-based systems require manual updates and do not scale well to new domains.

16.
What is entity linking in NER?






Correct Answer: B

Explanation:

Entity linking maps recognized entities to structured databases like Wikipedia or knowledge graphs.

17.
Which feature is important in traditional NER?






Correct Answer: B

Explanation:

Features like capitalization and word patterns are strong indicators of named entities.

18.
What issue arises when multiple words form a single entity like “New York City”?






Correct Answer: B

Explanation:

NER systems must correctly identify and group multiple tokens into a single entity.

19.
Which technique improves NER performance in deep learning models?






Correct Answer: B

Explanation:

Pretrained models like BERT capture deep contextual representations, significantly improving NER performance.

20.
Which is a real-world application of NER?






Correct Answer: C

Explanation:

NER is widely used in resume parsing to extract names, skills, organizations, and other structured information.

Sunday, March 1, 2026

Advanced Word2Vec MCQs – Skip-gram, Negative Sampling & Softmax

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs
🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
Quiz Mode:

Advanced Word2Vec MCQs with Answers (Skip-gram, SGNS & Softmax)

This page provides 20 advanced multiple-choice questions (MCQs) on Word2Vec covering Skip-gram, CBOW, Negative Sampling (SGNS), Full Softmax, subsampling, PMI matrix factorization, cosine similarity, and embedding theory. These questions are designed for postgraduate students, research scholars, competitive exams, and machine learning interviews.

Topics Covered in These Word2Vec MCQs

  • Skip-gram vs CBOW differences
  • Full Softmax computational complexity O(|V|)
  • Negative Sampling and the 3/4 distribution smoothing
  • Subsampling of frequent words
  • Shifted PMI matrix factorization interpretation of SGNS
  • Cosine similarity and embedding geometry
  • Static embedding limitations (polysemy problem)
  • Effect of window size and dimensionality

Who Should Practice These Questions?

These advanced Word2Vec MCQs are suitable for learners preparing for NLP exams, machine learning viva, university theory exams, research interviews, and technical placements. The explanations emphasize conceptual understanding rather than memorization.



1.
In Skip-gram with full softmax, what is the primary computational bottleneck when vocabulary size is extremely large (e.g., 1 million words)?






Correct Answer: C

Explanation:

The denominator of the softmax requires summing over all vocabulary words. If vocabulary size is 1 million, 1 million dot products must be computed for every update, making training computationally expensive.

Full softmax requires computing the denominator over the entire vocabulary: Σ exp(vwT vwc) Time complexity = O(|V|) per training example.
2.
In negative sampling, if a negative word vector is orthogonal to the center word vector, what happens to its gradient update?






Correct Answer: C

Explanation:

If vectors are orthogonal, their dot product is zero. Since sigmoid(0) = 0.5, the gradient is small but not zero. The model still updates the vectors to push negative samples away.

3.
Given the subsampling probability formula P(w) = 1 - √(t / f(w)), what happens when word frequency f(w) is much larger than t?






Correct Answer: B

Explanation:

When frequency is very high, t/f(w) becomes very small, making the discard probability approach 1. Thus very frequent words like "the" are removed most of the time.

4.
Why does Word2Vec learn two embedding matrices (W and W') but typically use only W after training?






Correct Answer: C

Explanation:

The input matrix W represents center-word embeddings and captures semantic structure. The output matrix W' represents context embeddings and is usually discarded after training.

5.
If two words have nearly identical context distributions in a corpus, their Word2Vec embeddings will most likely:






Correct Answer: C

Explanation:

According to the distributional hypothesis, words appearing in similar contexts obtain similar embeddings, resulting in high cosine similarity.

6.
Which scenario particularly favors Skip-gram over CBOW?






Correct Answer: C

Explanation:

Skip-gram generates more training signals per word and performs better for rare words, while CBOW is generally faster and smoother for frequent words.

7.
Why is the negative sampling distribution raised to the power of 3/4?






Correct Answer: C

Explanation:

Raising frequencies to the power 3/4 reduces dominance of very frequent words and increases medium-frequency sampling, improving embedding quality.

8.
Why does the analogy king - man + woman ≈ queen work in Word2Vec?






Correct Answer: C

Explanation:

Word2Vec embeddings capture linear semantic relationships, allowing vector arithmetic to represent analogies like gender direction in embedding space.

9.
If neither negative sampling nor hierarchical softmax is used, training Word2Vec with full softmax becomes:






Correct Answer: C

Explanation:

Full softmax requires computation across entire vocabulary for each update, making complexity proportional to vocabulary size and thus very slow.

10.
Which limitation is fundamentally unavoidable in static Word2Vec embeddings trained without contextualization?






Correct Answer: B

Explanation:

Classic Word2Vec learns a single static vector representation for each word type, regardless of context. Therefore, polysemous words like “bank” (river bank vs financial bank) receive only one embedding and cannot represent different meanings based on context.

11.
Skip-gram with Negative Sampling (SGNS) has been theoretically shown to approximate factorization of which matrix?






Correct Answer: C

Explanation:

Research shows that Skip-gram with Negative Sampling implicitly factorizes a shifted PMI matrix. This explains why semantic similarity emerges geometrically in Word2Vec embeddings.

12.
Increasing the context window size in Word2Vec primarily encourages the model to capture more:






Correct Answer: C

Explanation:

Small window sizes focus on syntactic relationships, while larger window sizes capture broader topical and semantic relationships across sentences.

13.
In trained Word2Vec embeddings, very frequent words often tend to have:






Correct Answer: B

Explanation:

Frequent words receive many gradient updates during training, which often leads to larger embedding magnitudes compared to rare words.

14.
If the number of negative samples (k) is significantly increased in Skip-gram with Negative Sampling, what is the most likely effect?






Correct Answer: B

Explanation:

Increasing k improves approximation to full softmax and may enhance embedding quality, but training time increases linearly with k.

15.
Is cosine similarity between two Word2Vec embeddings symmetric?






Correct Answer: A

Explanation:

Cosine similarity is mathematically symmetric: cos(a, b) equals cos(b, a). This property is independent of the training architecture.

16.
Why does Word2Vec use the dot product between word vectors during training?






Correct Answer: B

Explanation:

The dot product measures alignment between vectors. Higher dot product increases predicted probability that two words co-occur.

17.
Very rare words in Word2Vec training tend to have:






Correct Answer: B

Explanation:

Rare words receive very few updates, so their embeddings are often poorly trained and unstable compared to frequent words.

18.
If subsampling of frequent words is completely removed, what is the most likely outcome?






Correct Answer: C

Explanation:

Without subsampling, high-frequency words appear in nearly every context and dominate gradient updates, harming semantic representation learning.

19.
If embedding dimensionality increases significantly (e.g., from 100 to 1000), what is the most likely effect?






Correct Answer: C

Explanation:

Higher dimensional embeddings increase representational capacity but also computational cost and risk of overfitting, especially with limited data.

20.
In Word2Vec, a word’s embedding primarily reflects:






Correct Answer: B

Explanation:

Word2Vec learns embeddings based on global co-occurrence patterns throughout the corpus, not on individual sentence position.

Thursday, February 19, 2026

RAG vs Fine-Tuning MCQs (Top 10 with Detailed Explanations) – Generative AI Guide

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs
🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
Quiz Mode:

RAG vs Fine-Tuning: Top 10 MCQs with Detailed Explanations

As large language models (LLMs) become central to modern AI applications, two key techniques are widely used to adapt them for real-world tasks: Retrieval-Augmented Generation (RAG) and Fine-tuning. While both approaches enhance model performance, they serve different purposes and are suitable for different scenarios.

RAG improves factual accuracy by retrieving relevant external information at query time, making it ideal for dynamic and frequently updated knowledge. Fine-tuning, on the other hand, modifies the model’s internal parameters to align its behavior, tone, and task-specific capabilities.

Understanding the differences between these two techniques is essential for students, researchers, and AI practitioners working with generative AI systems. This MCQ set presents carefully designed questions that test conceptual understanding, practical use cases, scalability considerations, and real-world trade-offs between RAG and Fine-tuning.

Each question includes a detailed explanation to help you build strong conceptual clarity and prepare for exams, interviews, and advanced study in Generative AI.


RAG vs Fine-Tuning: A Simple Comparison

Retrieval-Augmented Generation (RAG) and Fine-tuning are two widely used techniques for improving the performance of large language models (LLMs). Although both methods enhance model usefulness, they work in fundamentally different ways and are suitable for different types of problems.

Aspect RAG Fine-Tuning
How it works Retrieves relevant external documents at query time and uses them as context Updates the model’s internal weights using additional training data
Knowledge updates Easy – just update the document database Difficult – requires retraining the model
Best for Frequently changing or large knowledge bases Consistent tone, style, or task-specific behavior
Infrastructure Requires embeddings and a vector database Requires training data and computational resources
Knowledge storage External (documents, databases) Internal (model parameters)
Use cases Chatbots with company knowledge, website assistants, enterprise search Structured outputs, domain-specific writing style, instruction alignment

In practice, many real-world systems combine both approaches. RAG provides up-to-date and factual information, while fine-tuning ensures consistent response quality, tone, and task alignment.


Practice Questions on RAG vs Fine-Tuning

1.
What is the primary purpose of Retrieval-Augmented Generation (RAG)?






Correct Answer: B

Explanation:

Retrieval-Augmented Generation (RAG) is designed to improve the factual accuracy and relevance of large language model outputs by providing external knowledge at inference time. Instead of modifying the model’s internal weights, RAG retrieves semantically relevant documents (using embeddings and vector search) based on the user’s query and includes this information in the prompt.

This approach is particularly useful when:

  • The knowledge base is large or frequently updated
  • The information is domain-specific or private
  • Retraining the model is expensive or impractical

Unlike fine-tuning, RAG keeps the model unchanged and separates knowledge storage from model learning, making it scalable and flexible for real-world applications such as enterprise search, website assistants, and documentation chatbots.

2.
Which component of a large language model is modified during fine-tuning?






Correct Answer: C

Explanation:

Fine-tuning involves updating the internal weights and parameters of a pretrained large language model using additional domain-specific or task-specific training data. This process adjusts how the model represents language patterns internally, allowing it to better perform a targeted task or adopt a specific behavioral style.

During fine-tuning:
  • The model undergoes additional gradient updates using supervised training data
  • Parameters are modified to reflect domain knowledge or output preferences
  • The learned changes become permanently embedded in the model

It is important to note that fine-tuning does not modify the context window size, the external document store and the prompt template.

Unlike RAG, which retrieves knowledge dynamically at inference time, fine-tuning encodes knowledge directly into model weights. This makes it suitable for:

  • Style control (formal, academic, conversational)
  • Structured output formatting
  • Task-specific behavior alignment

However, incorporating new factual knowledge through fine-tuning requires retraining, which can be computationally expensive and time-consuming.

3.
Which approach is most suitable for handling knowledge that changes frequently, such as company policies or product catalogs?






Correct Answer: B

Explanation:

RAG is specifically designed to retrieve external knowledge dynamically at inference time. This makes it highly suitable for domains where information changes frequently, such as policy updates, pricing data, inventory details, or regulatory documents.

In contrast, fine-tuning embeds knowledge into the model’s weights. If the knowledge changes, the model must be retrained, which is computationally expensive and operationally inefficient.

RAG allows organizations to:

  • Update documents in the knowledge base without retraining
  • Maintain separation between knowledge storage and model reasoning
  • Scale easily as data grows

Therefore, for dynamic and evolving knowledge environments, RAG is the preferred and scalable solution.

4.
A company wants its chatbot to consistently respond in a formal legal tone with structured output formatting. Which method is most appropriate?






Correct Answer: B

Explanation:

Fine-tuning modifies the model’s internal parameters to align its behavior with specific stylistic, structural, or task requirements. If a chatbot must consistently generate responses in a formal legal tone with defined output formatting, fine-tuning provides long-term behavioral alignment.

RAG, on the other hand, focuses on retrieving factual information. While it can improve knowledge accuracy, it does not guarantee stylistic consistency across responses.

Fine-tuning is ideal when:

  • Output style must remain consistent
  • Responses follow a predefined template
  • Task-specific reasoning behavior is required

Thus, for tone control and structural alignment, fine-tuning is the most appropriate method.

5.
What is a major limitation of fine-tuning when compared to RAG?






Correct Answer: B

Explanation:

Fine-tuning embeds knowledge directly into the model’s parameters. While this can improve task performance and stylistic alignment, updating knowledge requires retraining the model with new data.

Retraining is:

  • Computationally expensive
  • Time-consuming
  • Operationally complex

In contrast, RAG allows immediate knowledge updates by simply modifying the external document store. No retraining is required. This makes RAG significantly more flexible in rapidly evolving domains.

6.
In a RAG pipeline, what is the primary function of embeddings?






Correct Answer: C

Explanation:

Embeddings transform textual data into high-dimensional numerical vectors that capture semantic meaning. In a RAG system, both user queries and documents are converted into embeddings.

The system then performs similarity search to identify documents whose embeddings are closest to the query embedding. This enables contextually relevant retrieval beyond simple keyword matching.

Thus, embeddings are the core mechanism enabling semantic retrieval in RAG architectures.

7.
Why do many production systems combine RAG and fine-tuning?






Correct Answer: B

Explanation:

RAG provides up-to-date factual knowledge by retrieving external documents, while fine-tuning aligns the model’s internal reasoning style and output structure.

Combining both allows systems to:

  • Deliver accurate, grounded responses
  • Maintain stylistic consistency
  • Align with domain-specific requirements

This hybrid approach is increasingly common in enterprise AI deployments.

8.
Why is RAG considered more scalable for enterprise knowledge management?






Correct Answer: C

Explanation:

RAG decouples knowledge from the model itself. Documents are stored externally in databases or vector stores, allowing independent updates without retraining the model.

This separation:

  • Improves scalability
  • Reduces maintenance cost
  • Supports large and evolving knowledge bases

For enterprises managing thousands of documents, this architecture is significantly more efficient than embedding knowledge into model weights.

9.
Which statement best describes the cost trade-off between RAG and fine-tuning?






Correct Answer: C

Explanation:

Fine-tuning requires computational resources for training, dataset preparation, and validation. These costs occur upfront.

RAG avoids retraining but introduces ongoing infrastructure requirements such as:

  • Embedding generation
  • Vector database maintenance
  • Retrieval computation during inference

Therefore, each method has different cost dynamics depending on system scale and usage patterns.

10.
How does RAG help reduce hallucinations compared to fine-tuning?






Correct Answer: B

Explanation:

Hallucination occurs when a model generates plausible but incorrect information. RAG mitigates this by supplying retrieved documents as grounding evidence.

Because the model generates responses conditioned on real retrieved content, factual reliability improves. However, RAG does not eliminate hallucinations completely — it reduces them by anchoring responses in external knowledge.

Fine-tuning improves behavior and task alignment but does not inherently guarantee grounding in external evidence.

Please visit, subscribe and share 10 Minutes Lectures in Computer Science

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents