Hidden Markov Model (HMM) – MCQs, Notes & Practice Questions | ExploreDatabase

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs

🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.

Hidden Markov Model POS tagging example showing hidden states, observable states, transition probabilities, and emission probabilities

Key Concepts Illustrated in the Figure

Visible states (Observations)
Visible states are the observed outputs of an HMM, such as words in a sentence. In the above figure, 'cat', 'purrs', etc are observations.
Hidden states
Hidden states are the unobserved underlying states (e.g., POS tags - 'DT', 'N', etc in the figure) that generate the visible observations.
Transition probabilities
Transition probabilities define the likelihood of moving from one hidden state to another. In the figure, this is represented by the arrows from one POS tag to the other. Example: P(N -> V) or P(V | N).
Emission probabilities
Emission probabilities define the likelihood of a visible observation being generated by a hidden state. In the figure, this is represented by the arrows from POS tags to words. Example: P(cat | N).
POS tagging using HMM
POS tagging using HMM models tags as hidden states and words as observations to find the most probable tag sequence.
Evaluation problem
The evaluation problem computes the probability of an observation sequence given an HMM.
Forward algorithm
The forward algorithm efficiently solves the evaluation problem using dynamic programming.
Decoding problem
The decoding problem finds the most probable hidden state sequence for a given observation sequence.

1. In POS tagging using HMM, the hidden states represent:

A. Words in the sentence
B. POS tags assigned to each word
C. Dependency relations
D. Lemmatized tokens

Correct Answer: B

In HMM-based POS tagging, tags are hidden states and words are observed symbols.

2. The most suitable algorithm for decoding the best POS sequence in HMM tagging is:

A. Forward algorithm
B. Backward algorithm
C. Baum–Welch algorithm
D. Viterbi algorithm

Correct Answer: D

Viterbi decoding finds the most probable hidden tag sequence.

3. Transition probabilities in HMM POS tagging define:

A. P(word | tag)
B. P(tag_t | tag_t−1)
C. P(sentence | corpus)
D. P(tag | word frequency)

Correct Answer: B

Transition probability models tag-to-tag dependency. That is, the probability of a tag t given another tag t-1 which is previous tag. It is calculated using Maximum Likelihood Estimation (MLE) as follows;

Maximum Likelihood Estimation (MLE)

When the state sequence is known (for example, in POS tagging with labeled training data), the transition probability is estimated using Maximum Likelihood Estimation.

a_ij = Count(t_i → t_j) / Count(t_i)

Where:

Count(t_i → t_j) is the number of times a POS tag t_i is immediately followed by a POS tag t_j in the training data.
Count(t_i) is the total number of appearence of tag t_i in the entire training data.

This estimation ensures that the transition probabilities for each state sum to 1.

For example, the transition probability P(Noun | Det) will be 6/10 or 0.6 if in the training corpus the tag sequence "Det Noun" (Eg. like in "The/Det cat/Noun" - this is called tagged training data) occurs 6 times and the tag "Det" alone appears 10 times overall.

4. Emission probability in POS tagging refers to:

A. Probability of tag transition
B. Probability of sentence structure
C. Probability of word given a tag
D. Probability of next two tags jointly

Correct Answer: C

Emission probability is P(word | tag).

It answer the question "Given a particular POS tag, how likely is it that this tag generates (emits) a specific word?"

Emission probability calculation: Out of the total number of times a tag appears in the training data (eg. NOUN), how many times it appears as the tag of a given word (eg. "cat/NOUN").

5. Which problem does Baum–Welch training solve in HMM POS tagging?

A. POS disambiguation
B. Optimal decoding
C. Learning parameters from unlabeled text
D. Removing rare tags

Correct Answer: C

Baum–Welch (EM) learns transition and emission probabilities without labeled data.

Baum–Welch Method

The Baum–Welch method is an algorithm used to train a Hidden Markov Model (HMM) when the true state (tag) sequence is unknown.

What does the Baum–Welch method do?

It estimates (learns) the transition and emission probabilities of an HMM from unlabeled data.

In Simple Terms

You are given only observation sequences (e.g., words)
You do not know the hidden state sequence (e.g., POS tags)
Baum–Welch automatically learns the model parameters that best explain the data

The Baum–Welch method is used to train an HMM by estimating transition and emission probabilities from unlabeled observation sequences using EM. Baum-Welch is a special case of Expectation-Maximization (EM) algorithm.

6. If an HMM POS tagger has 50 tags and a 20,000-word vocabulary, the emission matrix size is:

A. 20,000 × 20,000
B. 50 × 20,000
C. 20,000 × 50
D. 50 × 50

Correct Answer: B

Rows correspond to tags and columns to words.

In an HMM POS tagger, the emission matrix represents:

P(word | tag)

So its dimensions are:

Rows = number of tags
Columns = vocabulary size

Given:

Number of tags = 50
Vocabulary size = 20,000

Emission matrix size:

50 × 20,000

7. A trigram HMM POS tagger models:

A. P(tag_t | tag_t−1)
B. P(tag_t | tag_t−1, tag_t−2)
C. P(word | tag)
D. P(word_t | word_t−1)

Correct Answer: B

Trigram models capture dependency on two previous tags.

Trigram Model

A trigram model assumes that the probability of a tag (or word) depends on the previous two tags.

P(t_i | t_i−1, t_i−2)

In POS tagging using an HMM:

Transition probabilities are computed using trigrams of tags
The model captures more context than unigram or bigram models

Example:

If the previous two tags are DT and NN, the probability of the next tag VB is:

P(VB | DT, NN)

Note: In practice, smoothing and backoff are used because many trigrams are unseen.

8. Data sparsity in emission probabilities mostly occurs due to:

A. Too many hidden states
B. Unseen words in test data
C. Transition collapse
D. Deterministic decoding

Correct Answer: B

Unseen words lead to zero emission probabilities without smoothing.

Data sparsity in emission probabilities means that many valid word–tag combinations were never seen during training, so their probabilities are zero or unreliable.

Data sparsity may occur due to one or more of the following;

Natural language has a very large vocabulary.
Training data is finite.
New or rare words often appear during test time.

As a result, many words in the test data were never observed with any tag during training.

9. A common solution for unknown words in HMM POS tagging is:

A. Random assignment
B. Laplace or Good–Turing smoothing
C. Removing words
D. Ignoring emissions

Correct Answer: B

Smoothing assigns non-zero probabilities to unseen events.

Refer here for more information about Laplace smoothing.

10. POS tagging is considered a:

A. Generation task
B. Knowledge-graph task
C. Sequence labeling task
D. Alignment task

Correct Answer: C

Each token is labeled sequentially → classic sequence labeling.

POS Tagging as a Sequence Labeling Task

POS tagging is a sequence labeling task because the goal is to assign a label (POS tag) to each element in a sequence (words in a sentence) while considering their order and context.

What is Sequence Labeling?

In sequence labeling, we:

Take an input sequence: w₁, w₂, …, w_n
Produce an output label sequence: t₁, t₂, …, t_n

Each input item receives one corresponding label, and the labels are not independent of each other.

POS Tagging as Sequence Labeling

Input sequence → words in a sentence

The / cat / sleeps

Output sequence → POS tags

DT / NN / VBZ

Each word must receive exactly one POS tag, and the choice of tag depends on:

The current word (emission probability)
The neighboring tags (context / transition probability)

Major links

Quicklinks

Wednesday, December 17, 2025