NLP MCQ Quiz – 25 Multiple Choice Questions with Answers (Advanced Level)
Section 1: Basics & Preprocessing (5 Questions)
Q1. What is tokenization in NLP?
A. Combining multiple sentences into one
B. Splitting text into smaller units like words or sentences
C. Removing stopwords from text
D. Converting text into numbers
Answer: B
Explanation: Tokenization divides text into tokens for further processing.
Q2. Which of the following is a stopword?
A. Python
B. Running
C. The
D. BERT
Answer: C
Q3. Lemmatization differs from stemming because:
A. It ignores grammar
B. It converts words to root form using linguistic rules
C. It only removes suffixes
D. It removes punctuation
Answer: B
Explanation: Lemmatization reduces words to dictionary form using context, unlike stemming.
Q4. What is n-gram in NLP?
A. A type of neural network
B. A sequence of n contiguous items (words/characters)
C. An embedding method
D. A tokenization error
Answer: B
Explanation: N-grams are used to capture word sequences for statistical language models.
Q5. Which of the following is not a text preprocessing step?
A. Tokenization
B. Stopword removal
C. Part-of-speech tagging
D. Feature scaling
Answer: D
Explanation: Feature scaling is for numerical ML features, not NLP text preprocessing.
Section 2: Embeddings & Vectorization (5 Questions)
Q6. Word2Vec embeddings are:
A. Static
B. Contextual
C. Sparse vectors
D. One-hot encoded
Answer: A
Explanation: Word2Vec generates fixed embeddings for each word, independent of context.
B. GloVe
C. BERT
D. TF-IDF
Q8. TF-IDF stands for:
A. Term Frequency – Inverse Document Frequency
B. Total Frequency – Indexed Document Factor
C. Text Feature – Indexed Density Factor
D. Token Frequency – Inverse Data Frequency
Answer: A
Explanation: TF-IDF measures word importance relative to the document and corpus.
Q9. Which of the following is dense embedding?
A. One-hot vector
B. Word2Vec vector
C. Bag-of-words
D. Count vector
Answer: B
Explanation: Dense embeddings represent words in a continuous, low-dimensional vector space.
A. PCA
B. Stopword removal
C. Lemmatization
D. Stemming
Answer: A
Explanation: PCA reduces embedding dimensions while preserving variance.
Section 3: NLP Tasks (5 Questions)
Q11. Named Entity Recognition (NER) identifies:
A. Parts of speech
B. Relationships between words
C. Entities like person, location, organization
D. Word embeddings
Answer: C
Q12. Sentiment analysis predicts:
A. Grammar errors
B. Positive, negative, or neutral sentiment
C. Named entities
D. Topic distribution
Answer: B
Explanation: Sentiment analysis classifies text based on emotional polarity.
Q13. Dependency parsing determines:
A. Word order only
B. Grammatical structure by linking words with relations
C. Named entities
D. Stopwords in text
Answer: B
Explanation: Dependency parsing identifies syntactic relationships between words.
Q14. Which task predicts the next word in a sequence?
A. Text classification
B. Language modeling
C. Sentiment analysis
D. POS tagging
Answer: B
Explanation: Language models predict the probability of the next word given previous words.
Q15. Which of the following is a sequence-to-sequence task?
A. Text summarization
B. Word tokenization
C. Bag-of-words representation
D. Feature scaling
Answer: A
Explanation: Sequence-to-sequence models generate output sequences from input sequences, e.g., summarization or translation.
Section 4: Transformers & LLMs (5 Questions)
Q16. What does self-attention in Transformers do?
A. Computes representation of a word considering all other words in the sequence
B. Removes irrelevant words
C. Converts words into vectors
D. Tokenizes sentences
Answer: A
Q17. In BERT, the pre-training tasks include:
A. Next sentence prediction & masked language modeling
B. Part-of-speech tagging only
C. Dependency parsing
D. Clustering
Answer: A
Explanation: BERT learns contextual embeddings using these two pre-training tasks.
Q18. Which LLM is autoregressive?
A. GPT
B. BERT
C. GloVe
D. Word2Vec
Answer: A
Explanation: GPT generates text token by token, predicting the next token based on previous ones.
Q19. Positional encoding is used in Transformers to:
A. Tokenize text
B. Encode word order information
C. Reduce vocabulary size
D. Identify named entities
Answer: B
Explanation: Transformers lack recurrence; positional encoding provides sequence order information.
Q20. Attention heads in a Transformer:
A. Perform parallel computations to capture different relations
B. Remove stopwords
C. Normalize embeddings
D. Predict sentiment
Answer: A
Section 5: Applications & Advanced Concepts (5 Questions)
Q21. Which NLP task is extractive summarization?
A. Selecting key sentences from the original text
B. Generating new text
C. Translating text
D. Tokenizing words
Answer: A
Explanation: Extractive summarization picks existing sentences; abstractive generates new text.
Q22. BLEU score is used for:
A. Tokenization
B. Evaluation of machine translation
C. Feature extraction
D. Stopword removal
Answer: B
Explanation: BLEU measures similarity between machine-translated text and reference translation.
Q23. What is a pre-trained model in NLP?
A. A model trained from scratch for a task
B. A model already trained on large corpora, used for downstream tasks
C. Any LSTM model
D. Bag-of-words vectorizer
Answer: B
Explanation: Pre-trained models like BERT or GPT can be fine-tuned for specific tasks.
Q24. Which of the following is not an NLP evaluation metric?
A. ROUGE
B. BLEU
C. F1-score
D. PCA
Answer: D
Explanation: PCA is for dimensionality reduction, not NLP evaluation.
Q25. Prompt engineering in LLMs refers to:
A. Designing effective inputs to get desired model outputs
B. Tokenization of text
C. Reducing model parameters
D. Pre-training embeddings
Answer: A
Explanation: Well-crafted prompts help LLMs generate accurate or contextually relevant outputs.
