1. What is the primary purpose of Part-of-Speech (POS) tagging in NLP?

A. To remove stop words
B. To assign grammatical roles to each word
C. To translate text into another language
D. To detect sentence boundaries

Answer: B
Explanation:

POS tagging assigns syntactic categories such as noun, verb, adjective, and adverb to each word so that the grammatical structure of the sentence can be understood by the NLP model.

What is POS tagging?

POS (Part‑of‑Speech) Tagging is the process of assigning each word in a sentence a grammatical category (noun, verb, adjective, etc.) based on its form and context. It’s the “glue” that turns raw text into a structured, linguistically‑annotated form that downstream NLP systems can consume.

Example:

The word "book" can function as either a noun ("I read a book") or a verb ("I will book a flight"), and POS tagging disambiguates these different uses by analyzing the surrounding linguistic context.

Why POS tagging is important?

  • Helps a parser decide which part of a sentence is a subject vs. object. (Example tasks: Named‑entity recognition, sentiment analysis, machine translation, speech‑to‑text, text‑to‑speech.)

  • Allows the system to understand word ambiguities (e.g., “record” as a noun vs. verb). (Example tasks: Coreference resolution, information extraction, question answering.)

  • Enables feature engineering (e.g., “the current word is a determiner”). (Example tasks: POS‑based features for many classification tasks.)

2. Which NLP task category does POS tagging belong to?

A. Topic modeling
B. Sequence labeling
C. Clustering
D. Machine translation

Answer: B
Explanation:

POS tagging assigns a label to each token in a sentence in order, making it a sequence labeling task similar to Named Entity Recognition (NER) and chunking.

3. Which statistical model was widely used in early POS tagging systems?

A. Hidden Markov Model (HMM)
B. Recurrent Neural Networks
C. Support Vector Machines
D. GAN Networks

Answer: A
Explanation:

Hidden Markov Models (HMMs) were among the earliest successful approaches for POS tagging because they model sequential probabilities and tag transitions effectively.

4. Which tag in the Penn Treebank POS set represents a proper noun (singular)?

A. NNS
B. VBD
C. NNP
D. PRP

Answer: C
Explanation:

The tag NNP refers to a proper noun in singular form (e.g., India, Google, Sarah), whereas NNPS represents plural proper nouns.

5. Which sentence element causes difficulty for POS taggers because it can serve multiple grammatical roles?

A. Punctuation
B. Ambiguous words
C. Stopwords
D. Numbers

Answer: B
Explanation:

Words like "light", "book", or "play" may function as different parts of speech depending on context, creating ambiguity for tagging systems.

6. A rule-based POS tagging system mainly depends on:

A. Handcrafted linguistic rules
B. Word embeddings
C. Subword tokenization
D. Training on large annotated datasets

Answer: A
Explanation:

Rule-based POS taggers rely on human-defined grammar rules and lexicons, rather than training data or machine learning algorithms.

7. Which evaluation metric is commonly used to measure POS tagging performance?

A. BLEU Score
B. RMSE
C. Accuracy
D. Perplexity

Answer: C
Explanation:

The performance of POS taggers is typically evaluated using accuracy, which measures the proportion of correctly predicted tags.

8. Modern POS taggers based on BERT or deep learning handle unknown words better because they use:

A. One-hot encoding
B. Subword embeddings
C. Rule matching
D. Stopword filtering

Answer: B
Explanation:

Models like BERT use subword tokenization (e.g., WordPiece), which helps correctly process previously unseen or rare words.

9. In the sentence "They can fish", the word "can" is tagged as:

A. NN
B. MD (modal verb)
C. VBD
D. IN

Answer: B
Explanation:

In this sentence, "can" is used as a modal verb expressing possibility, not as a noun meaning "container".

10. Which of the following corpora is commonly used for training POS taggers?

A. ImageNet
B. Penn Treebank
C. CIFAR-10
D. COCO dataset

Answer: B
Explanation:

The Penn Treebank contains syntactically annotated sentences widely used in POS tagging research and model training.