Computer Science and Engineering - Tutorials, Notes, MCQs, Questions and Answers: NLP MCQ Quiz – 25 Advanced Questions with Answers (2025)

NLP MCQ Quiz – 25 Multiple Choice Questions with Answers (Advanced Level)

Section 1: Basics & Preprocessing (5 Questions)

Q1. What is tokenization in NLP?

A. Combining multiple sentences into one
B. Splitting text into smaller units like words or sentences
C. Removing stopwords from text
D. Converting text into numbers

Answer: B

Explanation: Tokenization divides text into tokens for further processing.

Q2. Which of the following is a stopword?

A. Python
B. Running
C. The
D. BERT

Answer: C

Explanation: Stopwords are common words (like “the”, “and”) often removed during preprocessing.

Q3. Lemmatization differs from stemming because:
A. It ignores grammar
B. It converts words to root form using linguistic rules
C. It only removes suffixes
D. It removes punctuation

Answer: B

Explanation: Lemmatization reduces words to dictionary form using context, unlike stemming.

Q4. What is n-gram in NLP?

A. A type of neural network
B. A sequence of n contiguous items (words/characters)
C. An embedding method
D. A tokenization error

Answer: B

Explanation: N-grams are used to capture word sequences for statistical language models.

Q5. Which of the following is not a text preprocessing step?

A. Tokenization
B. Stopword removal
C. Part-of-speech tagging
D. Feature scaling

Answer: D

Explanation: Feature scaling is for numerical ML features, not NLP text preprocessing.

Section 2: Embeddings & Vectorization (5 Questions)

Q6. Word2Vec embeddings are:

A. Static
B. Contextual
C. Sparse vectors
D. One-hot encoded

Answer: A

Explanation: Word2Vec generates fixed embeddings for each word, independent of context.

Q7. Which embedding model generates contextualized word representations?

A. Word2Vec
B. GloVe
C. BERT
D. TF-IDF

Answer: C

Explanation: BERT produces embeddings that change depending on the surrounding words.

Q8. TF-IDF stands for:

A. Term Frequency – Inverse Document Frequency
B. Total Frequency – Indexed Document Factor
C. Text Feature – Indexed Density Factor
D. Token Frequency – Inverse Data Frequency

Answer: A

Explanation: TF-IDF measures word importance relative to the document and corpus.

Q9. Which of the following is dense embedding?

A. One-hot vector
B. Word2Vec vector
C. Bag-of-words
D. Count vector

Answer: B

Explanation: Dense embeddings represent words in a continuous, low-dimensional vector space.

Q10. Dimensionality reduction for embeddings is usually done using:

A. PCA
B. Stopword removal
C. Lemmatization
D. Stemming

Answer: A

Explanation: PCA reduces embedding dimensions while preserving variance.

Section 3: NLP Tasks (5 Questions)

Q11. Named Entity Recognition (NER) identifies:

A. Parts of speech
B. Relationships between words
C. Entities like person, location, organization
D. Word embeddings

Answer: C

Explanation: NER extracts structured entities from unstructured text.

Q12. Sentiment analysis predicts:

A. Grammar errors
B. Positive, negative, or neutral sentiment
C. Named entities
D. Topic distribution

Answer: B

Explanation: Sentiment analysis classifies text based on emotional polarity.

Q13. Dependency parsing determines:

A. Word order only
B. Grammatical structure by linking words with relations
C. Named entities
D. Stopwords in text

Answer: B

Explanation: Dependency parsing identifies syntactic relationships between words.

Q14. Which task predicts the next word in a sequence?

A. Text classification
B. Language modeling
C. Sentiment analysis
D. POS tagging

Answer: B

Explanation: Language models predict the probability of the next word given previous words.

Q15. Which of the following is a sequence-to-sequence task?

A. Text summarization
B. Word tokenization
C. Bag-of-words representation
D. Feature scaling

Answer: A

Explanation: Sequence-to-sequence models generate output sequences from input sequences, e.g., summarization or translation.

Section 4: Transformers & LLMs (5 Questions)

Q16. What does self-attention in Transformers do?

A. Computes representation of a word considering all other words in the sequence
B. Removes irrelevant words
C. Converts words into vectors
D. Tokenizes sentences

Answer: A

Explanation: Self-attention allows Transformers to weigh the importance of each word in context.

Q17. In BERT, the pre-training tasks include:

A. Next sentence prediction & masked language modeling
B. Part-of-speech tagging only
C. Dependency parsing
D. Clustering

Answer: A

Explanation: BERT learns contextual embeddings using these two pre-training tasks.

Q18. Which LLM is autoregressive?

A. GPT
B. BERT
C. GloVe
D. Word2Vec

Answer: A

Explanation: GPT generates text token by token, predicting the next token based on previous ones.

Q19. Positional encoding is used in Transformers to:

A. Tokenize text
B. Encode word order information
C. Reduce vocabulary size
D. Identify named entities

Answer: B

Explanation: Transformers lack recurrence; positional encoding provides sequence order information.

Q20. Attention heads in a Transformer:

A. Perform parallel computations to capture different relations
B. Remove stopwords
C. Normalize embeddings
D. Predict sentiment

Answer: A

Explanation: Multiple heads allow the model to focus on various aspects of word relationships simultaneously.

Section 5: Applications & Advanced Concepts (5 Questions)

Q21. Which NLP task is extractive summarization?

A. Selecting key sentences from the original text
B. Generating new text
C. Translating text
D. Tokenizing words

Answer: A

Explanation: Extractive summarization picks existing sentences; abstractive generates new text.

Q22. BLEU score is used for:

A. Tokenization
B. Evaluation of machine translation
C. Feature extraction
D. Stopword removal

Answer: B

Explanation: BLEU measures similarity between machine-translated text and reference translation.

Q23. What is a pre-trained model in NLP?

A. A model trained from scratch for a task
B. A model already trained on large corpora, used for downstream tasks
C. Any LSTM model
D. Bag-of-words vectorizer

Answer: B

Explanation: Pre-trained models like BERT or GPT can be fine-tuned for specific tasks.

Q24. Which of the following is not an NLP evaluation metric?

A. ROUGE
B. BLEU
C. F1-score
D. PCA

Answer: D

Explanation: PCA is for dimensionality reduction, not NLP evaluation.

Q25. Prompt engineering in LLMs refers to:

A. Designing effective inputs to get desired model outputs
B. Tokenization of text
C. Reducing model parameters
D. Pre-training embeddings

Answer: A

Explanation: Well-crafted prompts help LLMs generate accurate or contextually relevant outputs.

TOPICS (Click to Navigate)

Wednesday, October 15, 2025

NLP MCQ Quiz – 25 Advanced Questions with Answers (2025)

NLP MCQ Quiz – 25 Multiple Choice Questions with Answers (Advanced Level)

Section 1: Basics & Preprocessing (5 Questions)

Section 2: Embeddings & Vectorization (5 Questions)

Section 3: NLP Tasks (5 Questions)

Section 4: Transformers & LLMs (5 Questions)

Section 5: Applications & Advanced Concepts (5 Questions)

No comments:

Post a Comment

Featured Content

Multiple choice questions in Natural Language Processing Home

All time most popular contents

Report Abuse