What does a second-order POS HMM consider?

A second-order POS HMM considers two previous tags, using P(ti | ti-1, ti-2) to improve contextual modeling.

What is required for supervised POS HMM training?

Supervised POS HMMs need word-tag annotated sentences to estimate emission and transition probabilities.

What does decoding mean in HMM-based POS tagging?

Decoding refers to selecting the most likely tag sequence for a word sequence, typically using the Viterbi algorithm.

What is the Forward-Backward algorithm used for in HMMs?

It is used for unsupervised HMM learning as part of the Baum-Welch EM algorithm to estimate expected probabilities.

Why is smoothing important in HMMs for POS tagging?

Smoothing prevents zero-probability transitions and emissions, allowing the model to handle unseen word-tag pairs.

What is a core limitation of POS HMMs?

HMMs assume Markov independence and that words depend only on their tags, which limits contextual accuracy.

What happens when hidden states in a POS HMM are increased?

Increasing hidden states increases parameters, raising the risk of overfitting and slowing inference.

How should rare words be handled in POS HMMs?

Rare words are best handled using smoothing techniques like Laplace or Good-Turing smoothing.

How does a trigram HMM improve POS tagging?

A trigram HMM models P(ti | ti-1, ti-2), capturing richer tag context for improved tagging accuracy.

What improves accuracy in unsupervised POS HMMs?

Morphological features and smoothing improve accuracy by providing additional cues without labeled data.

Computer Science and Engineering - Tutorials, Notes, MCQs, Questions and Answers: Top 10 Advanced HMM for POS Tagging — Important MCQs (2nd Order, Trigram, Smoothing, Viterbi)

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

Advanced POS Tagging with Hidden Markov Models — MCQ Introduction

Master POS tagging with this focused MCQ set on Hidden Markov Models (HMM): second-order & trigram models, Viterbi decoding, Baum-Welch training, smoothing techniques, and practical tips for supervised & unsupervised learning.

Part-of-speech (POS) tagging is a cornerstone task in Natural Language Processing (NLP) that assigns grammatical categories (noun, verb, adjective, etc.) to each token in a sentence. This question set concentrates on classical statistical taggers built with Hidden Markov Models (HMMs), which model tag sequences as hidden states and observed words as emissions.

HMM-based POS taggers remain valuable for their interpretability and efficiency. They are particularly useful when you need:

Lightweight, fast taggers for resource-constrained systems
Explainable probabilistic models for linguistic analysis and teaching
Strong baselines before moving to neural models like BiLSTM or Transformer taggers

This MCQ collection targets advanced HMM concepts — including second-order (trigram) models, the Viterbi decoding algorithm, Forward–Backward / Baum–Welch for unsupervised learning, and various smoothing strategies to handle rare or unseen words. Each question includes a concise explanation to help you understand not only the correct choice but why it matters for real-world POS tagging.

What you’ll learn

How higher-order HMMs (trigram / second-order) capture broader tag context.
Why supervised training requires labeled word–tag corpora and how unsupervised EM works.
The purpose of smoothing (Laplace, Good-Turing) to avoid zero probabilities.
Trade-offs: model complexity, overfitting, and inference cost when increasing hidden states.

Use these MCQs to prepare for exams, interviews, or to evaluate your grounding before progressing to neural POS taggers. Scroll down to start the questions and test your understanding of HMM-based POS tagging fundamentals and advanced techniques.

11. A second-order POS HMM considers:

A. One previous tag
B. Two previous tags
C. No previous tag
D. All future tags

Answer: B
Explanation:

Higher-order HMM models allow P(tᵢ | tᵢ₋₁, tᵢ₋₂) improving context.

A second-order POS HMM (Hidden Markov Model) is also called a trigram HMM.

It means: P(tᵢ | tᵢ₋₁, tᵢ₋₂).

This tells us: The probability of the current tag depends on two previous tags. Therefore, the HMM "looks back" two steps in the tag sequence.

Example: Let us take a simple sentence "dog chase cats". For the last word “cats”, whose tag is t₃, and given the previous two tags: t₂ = VERB (tag for “chase”) t₁ = NOUN (tag for “dogs”), you should write:

𝑃(𝑡_cats ∣ 𝑡_chase = VERB, 𝑡_dogs = NOUN)

________________________________________

12. Input for supervised POS HMM training must contain:

A. Raw sentences only
B. Word-tag annotated sentences
C. Part-of-speech dictionary
D. Dependency trees

Answer: B
Explanation:

Supervised models need labeled corpora to learn emission + transition probabilities.

Supervised POS HMM — Why labeled data is required?

A supervised POS HMM needs already labeled data so it can learn probabilities:

Transition probabilities

P(t_k | t_k-1) or P(t_k | t_k-1, t_k-2)

These require tag sequences.

Emission probabilities

P(w_k | t_k)

These require each word paired with its correct tag.

Therefore, supervised training must have sentences where every word already has a POS tag.

Example training line:

Dogs/NOUN chase/VERB cats/NOUN

________________________________________

13. Decoding in POS HMM refers to:

A. Parameter estimation
B. Tokenizing words
C. Selecting most likely tag sequence
D. Expanding vocabulary

Answer: C
Explanation:

Decoding maps observed words to best hidden tag sequence using Viterbi.

In a POS HMM (Hidden Markov Model), decoding means: Finding the most probable sequence of POS tags for a given sequence of words. This is usually done using the Viterbi algorithm.

So, decoding = tagging = choosing the best tag path.

________________________________________

14. Forward-Backward algorithm is mainly used for:

A. Viterbi decoding
B. Unsupervised HMM learning
C. POS dictionary building
D. Tokenization

Answer: B
Explanation:

It computes expected probabilities used in Baum-Welch EM training.

More information:

The Forward–Backward algorithm is the core of the Baum–Welch algorithm, which is used for: Unsupervised training of Hidden Markov Models (HMMs).

In unsupervised learning, the data has no POS tags, so the model must estimate: Transition probabilities, Emission probabilities

The Forward–Backward algorithm computes:

Forward probabilities α
Backward probabilities β
Expected counts for transitions and emissions

These expected counts are then used to re-estimate HMM parameters. This is EM (Expectation–Maximization).

________________________________________

15. Smoothing in HMM prevents:

A. Overtraining
B. Hidden state ambiguity
C. Viterbi path errors
D. Zero-probability transitions

Answer: D
Explanation:

Unseen tag-word or tag-tag pairs must not be zero → smoothing distributes probability.

In an HMM, probabilities are estimated from counts in the training data. If a transition or a word–tag pair never appears in training data, its probability becomes zero. This is dangerous because:

A zero probability wipes out entire Viterbi paths
The model cannot handle unseen words or unseen tag transitions

Smoothing (like Laplace, Good–Turing, Witten–Bell) adds a small nonzero probability to unseen events.

So smoothing prevents: Zero-probability transitions and zero-probability emissions.

________________________________________

16. A core limitation of POS HMM is that it:

A. Cannot tag new words
B. Assumes Markov + word independence
C. Requires deep networks
D. Needs semantic embeddings

Answer: B
Explanation:

HMM relies only on previous tag and assumes words depend only on tag, limiting context.

Markov + word independence - A limiation in HMM

A standard POS HMM makes two strong assumptions.

Markov Assumption (for tags): The current tag depends only on a small number of previous tags.
- This ignores long-range syntactic dependencies (e.g., subject–verb agreement across clauses).
Output Independence Assumption (for words): Words depend only on their own tag, not surrounding words.
- This ignores context that modern taggers use (e.g., CRFs, BiLSTMs, Transformers).

These assumptions simplify the model, but they also severely limit accuracy compared to modern NLP models.

________________________________________

17. Increasing hidden states in POS HMM generally:

A. May cause overfitting
B. Guarantees higher accuracy
C. Does nothing to model quality
D. Reduces computation

Answer: A
Explanation:

More states = more parameters → risk of overfitting & slower inference.

Increasing hidden states in POS HMM may cause overfitting

In a Hidden Markov Model (HMM) used for Part-of-Speech (POS) tagging, the "hidden states" correspond to the POS tags (like Noun, Verb, Adjective). Increasing the number of hidden states means using a more granular tagset (e.g., splitting "Noun" into "Singular Noun" and "Plural Noun") or simply increasing the model's capacity in an unsupervised setting.

Effect of increasing hidden states - Discussion

When you increase the number of states N:

You must estimate many more parameters.
But your dataset size stays the same.

So the model tries to estimate:

Many more transition probabilities (N²),
Many more emission probabilities (N × V).

With limited data, the HMM begins to:

Fit the quirks/noise of the training data,
Memorize rare patterns,
Over-specialize to word sequences it has seen,
Lose its ability to generalize to unseen text.

This phenomenon is overfitting.

________________________________________

18. Rare words are best handled using:

A. Laplace / Good-Turing smoothing
B. Discarding them
C. Forcing one tag
D. Ignoring in training

Answer: A
Explanation:

Smoothing reallocates probability mass → better tagging for unseen/low-freq words.

________________________________________

19. A trigram HMM improves tagging by modeling:

A. No transition
B. Two historical tags
C. One future tag
D. Word similarity

Answer: B
Explanation:

Trigram uses P(tᵢ | tᵢ₋₁, tᵢ₋₂) → better captures context patterns.

________________________________________

20. Unsupervised POS HMM accuracy increases with:

A. Random initialization
B. Morphological features + smoothing
C. Deleting rare words
D. Using only transitions

Answer: B
Explanation:

Morphology aids tagging without labels → suffix, prefix, capitalization rules.

TOPICS (Click to Navigate)

Thursday, December 11, 2025

Top 10 Advanced HMM for POS Tagging — Important MCQs (2nd Order, Trigram, Smoothing, Viterbi)

Advanced POS Tagging with Hidden Markov Models — MCQ Introduction

What you’ll learn

Supervised POS HMM — Why labeled data is required?

Markov + word independence - A limiation in HMM

Increasing hidden states in POS HMM may cause overfitting

Effect of increasing hidden states - Discussion

No comments:

Post a Comment

Featured Content

Multiple choice questions in Natural Language Processing Home

All time most popular contents

Report Abuse