🔍 Introduction to POS Tagging MCQs
Part-of-Speech (POS) tagging is a core component of Natural Language Processing (NLP) where each word in a sentence is labeled with its grammatical function such as noun, verb, adjective, or adverb. It plays a vital role in applications like parsing, named entity recognition (NER), sentiment analysis, text-to-speech, and machine translation.
The following multiple-choice questions (MCQs) will help you practice rule-based, probabilistic, and deep learning-based tagging approaches, including Brill Tagging, Hidden Markov Models, CRF models, and modern transformer-based POS tagging.
📌 Before beginning, you may also explore: What Are Morphemes in NLP?
✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.
A. To remove stop words
B. To assign grammatical roles to each word
C. To translate text into another language
D. To detect sentence boundaries
Explanation:
POS tagging assigns syntactic categories such as noun, verb, adjective, and adverb to each word so that the grammatical structure of the sentence can be understood by the NLP model.
What is POS tagging?
POS (Part‑of‑Speech) Tagging is the process of assigning each word in a sentence a grammatical category (noun, verb, adjective, etc.) based on its form and context. It’s the “glue” that turns raw text into a structured, linguistically‑annotated form that downstream NLP systems can consume.
Example:The word "book" can function as either a noun ("I read a book") or a verb ("I will book a flight"), and POS tagging disambiguates these different uses by analyzing the surrounding linguistic context.
Why POS tagging is important?
Helps a parser decide which part of a sentence is a subject vs. object. (Example tasks: Named‑entity recognition, sentiment analysis, machine translation, speech‑to‑text, text‑to‑speech.)
Allows the system to understand word ambiguities (e.g., “record” as a noun vs. verb). (Example tasks: Coreference resolution, information extraction, question answering.)
Enables feature engineering (e.g., “the current word is a determiner”). (Example tasks: POS‑based features for many classification tasks.)
A. Topic modeling
B. Sequence labeling
C. Clustering
D. Machine translation
Explanation:
POS tagging assigns a label to each token in a sentence in order, making it a sequence labeling task similar to Named Entity Recognition (NER) and chunking.
What is Sequence Labeling?
Sequence labeling is an NLP process where each word in a sentence is assigned a label that describes its role, meaning, or category, with predictions influenced by surrounding words. We refer it as sequence labeling because we are labeling not just individual words, but a sequence of words, where:
- The order matters
- The label of one token may depend on previous and next tokens
Sequence labeling is used in many NLP tasks. some of them with examples are as follows;
- Part-of-Speech (POS) Tagging: Words are tagged with the POSs Noun, Verb, Adjective, etc.
- Named Entity Recognition (NER): The named entities among the words in a sentence are tagged with appropriate NER categories like PERSON, LOCATION, ORGANIZATION
- Chunking / Shallow Parsing: Words, part of sentences or sentences are parsed and tagged with the chunking tags like NP (noun phrase), VP (verb phrase)
- Sentiment at Token Level: Words are tagged with Positive, Negative or Neutral
- Speech Recognition: Phonemes from the audio signals are tagged.
A. Hidden Markov Model (HMM)
B. Recurrent Neural Networks
C. Support Vector Machines
D. GAN Networks
Explanation:
Hidden Markov Models (HMMs) were among the earliest successful approaches for POS tagging because they model sequential probabilities and tag transitions effectively.
Hidden Markov Model (HMM)
A Hidden Markov Model is a statistical model used to represent systems that are assumed to follow a Markov process with hidden (unobserved) states. It learns hidden patterns from sequences and predicts them using probabilities.
A. NNS
B. VBD
C. NNP
D. PRP
Explanation:
The tag NNP refers to a proper noun in singular form (e.g., India, Google, Sarah), whereas NNPS represents plural proper nouns.
Penn Treebank dataset
Penn Treebank is a manually annotated corpus of about 4 M words (WSJ‑only) that provides POS tags and constituent parse trees for every token. It is the go‑to reference for any supervised sequence‑labelling or parsing work. It is the benchmark for supervised POS tagging and constituency parsing. It’s small enough to run on a laptop, yet rich enough to challenge modern neural models.
A. Punctuation
B. Ambiguous words
C. Stopwords
D. Numbers
Explanation:
Words like "light", "book", or "play" may function as different parts of speech depending on context, creating ambiguity for tagging systems.
A. Handcrafted linguistic rules
B. Word embeddings
C. Subword tokenization
D. Training on large annotated datasets
Explanation:
Rule-based POS taggers rely on human-defined grammar rules and lexicons, rather than training data or machine learning algorithms.
What is rule-based POS tagging?
POS tagging is the process of tagging the words in a text with the appropriate POS tags like noun, verb, etc. Rule-based POS tagging is a method of assigning part-of-speech (POS) labels (like noun, verb, adjective, etc.) to words based on a set of manually created linguistic rules rather than machine learning. It usually involves two steps:
- Lookup using a lexicon (dictionary): This is to identify all possible tags for a given word. Example: 'Book' can be NOUN or VERB.
- Apply disambiguation rules: Set of handwritten rules are applied to choose the correct tag. Example: 'If a word follows a determiner (the, a, an), the appropriate tag is NOUN.
A. BLEU Score
B. RMSE
C. Accuracy
D. Perplexity
Explanation:
The performance of POS taggers is typically evaluated using accuracy, which measures the proportion of correctly predicted tags.
How the metric ACCURACY is used to measure the performance of POS tagging?
Accuracy is a fundamental evaluation metric in machine learning that measures the overall correctness of a model's predictions by calculating the proportion of correct predictions out of all predictions made.
In Part-of-Speech (POS) tagging, it measures how many words were tagged correctly compared to the total number of words in the dataset.
Accuracy in POS tagging = (Number of correctly tagged words / Total number of words) × 100
It tells us how often the POS tagger assigns the correct tag and it is calculated by comparing system outputs to a gold reference dataset.Example: Let us take a sentence 'The cat sat on the mat'.
- The correct (gold standard) POS tags are: The/DT cat/NN sat/VBD on/IN the/DT mat/NN.
- Let us suppose your POS tagger tags the sentence as: The/DT cat/NN sat/VBD on/IN the/DT mat/VB.
A. One-hot encoding
B. Subword embeddings
C. Rule matching
D. Stopword filtering
Explanation:
Models like BERT use subword tokenization (e.g., WordPiece), which helps correctly process previously unseen or rare words.
What is Sub-word embeddings?
Subword embedding is a method for representing words as vectors by breaking them down into smaller units like character n-grams, prefixes, and suffixes, rather than using a single vector for the entire word. This approach allows models to handle rare or new words by combining the embeddings of their subword components, leading to better performance and generalization, especially in languages with rich morphology.
A. NN
B. MD (modal verb)
C. VBD
D. IN
Explanation:
In this sentence, "can" is used as a modal verb expressing possibility, not as a noun meaning "container".
A. ImageNet
B. COCO dataset
C. CIFAR-10
D. Penn Treebank
Explanation:
The Penn Treebank contains syntactically annotated sentences widely used in POS tagging research and model training.
No comments:
Post a Comment