✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.
Hidden Markov Model - MCQs - Problem-based Practice Questions
HMM-Based POS Tagging Practice
These questions explore key aspects of Hidden Markov Model (HMM) based Part-of-Speech (POS) tagging. Some questions explicitly provide prior (initial) probabilities, while others focus only on transition and emission probabilities. You will practice:
- Calculating posterior probabilities for individual words.
- Evaluating sequence likelihoods using transitions and emissions.
- Handling unseen words with smoothing techniques.
- Determining most likely tag sequences based on high-probability transitions.
| Emission Probabilities | Transition Probabilities |
|---|---|
| P(dog | Noun) = 0.6 | P(next = Noun | current = Noun) = 0.4 |
| P(dog | Verb) = 0.1 | P(next = Verb | current = Noun) = 0.6 |
| P(runs | Noun) = 0.1 | P(next = Noun | current = Verb) = 0.5 |
| P(runs | Verb) = 0.7 | P(next = Verb | current = Verb) = 0.5 |
In the table, 'next' and 'current' in the probability P(next = Noun | current = Noun) refer to 'POS tag of next word' and 'POS tag of current word' respectively.
Which is the most likely tag sequence for the sentence “dog runs” using the HMM?
A. Noun → Noun
B. Noun → Verb
C. Verb → Noun
D. Verb → Verb
Explanation:
P(Noun→Verb) = 0.6 × 0.6 × 0.7 = 0.252. Highest likelihood = Noun→Verb.
Step-by-Step Probability Computation
We compute the probability for each possible tag sequence using:
P(t₂ | t₁) × P(dog | t₁) × P(runs | t₂)
1. Sequence: Noun → Noun
- P(dog | Noun) = 0.6
- P(Noun | Noun) = 0.4
- P(runs | Noun) = 0.1
0.6 × 0.4 × 0.1 = 0.024
2. Sequence: Noun → Verb
- P(dog | Noun) = 0.6
- P(Verb | Noun) = 0.6
- P(runs | Verb) = 0.7
0.6 × 0.6 × 0.7 = 0.252
3. Sequence: Verb → Noun
- P(dog | Verb) = 0.1
- P(Noun | Verb) = 0.5
- P(runs | Noun) = 0.1
0.1 × 0.5 × 0.1 = 0.005
4. Sequence: Verb → Verb
- P(dog | Verb) = 0.1
- P(Verb | Verb) = 0.5
- P(runs | Verb) = 0.7
0.1 × 0.5 × 0.7 = 0.035
Highest Probability = 0.252
Most likely tag sequence:
B. Noun → Verb
Initial tag probabilities:
• P(Noun) = 0.7
• P(Adj) = 0.3
Emission probabilities for the word "red":
• P("red" | Noun) = 0.2
• P("red" | Adj) = 0.8
Calculate the normalized probability that 'red' is tagged as an Adjective.
A. 0.14
B. 0.56
C. 0.63
D. 0.24
Explanation:
Compute unnormalized scores:
• Adj = 0.3 × 0.8 = 0.24
• Noun = 0.7 × 0.2 = 0.14
Normalize to get posterior:
Adj = 0.24 / (0.24 + 0.14) ≈ 0.63.
Calculate the probability of a specific tag assignment for a single observed word
In a Hidden Markov Model (HMM), the probability of a specific tag assignment for a single observed word is calculated using the Joint Probability of the tag and the word. This is often referred to as the "Viterbi score" or "path probability" for that specific tag.
The formula for the joint probability of a single state (tag) and observation (word) is:
P(Tag,Word) = P(Tag) × P(Word∣Tag)
Where:
- P(Tag) is the Initial tag probability (Prior).
- P(Word∣Tag) is the Emission probability (Likelihood).
Step 1: Calculate the Joint Probabilities for All Tags
For each possible tag, compute the product of the initial probability and emission probability:
For tag = Adjective (Adj):
P(Adj,"red") = P(Adj) × P("red"∣Adj) = 0.3 × 0.8 = 0.24
For tag = Noun:
P(Noun,"red") = P(Noun) × P("red"∣Noun) = 0.7 × 0.2 = 0.14
Step 2: Calculate the Normalizing Constant (Total Probability)
The normalizing constant is the sum of all joint probabilities:
P("red") = P(Adj,"red") + P(Noun,"red") = 0.24 + 0.14 = 0.38
Step 3: Apply Bayes' Theorem to Get the Posterior Probability
Using the normalization formula:
P(Adj∣"red") = P(Adj,"red") / P("red") = 0.24 / 0.38
Simplifying:
P(Adj∣"red") = 0.24 / 0.38 ≈ 0.6316 or approximately 63.16%
Final Answer
The normalized probability that 'red' is tagged as an Adjective is:
P(Adj∣"red") = 0.24 / 0.38 ≈ 0.632 or 63.2%
| Transition | Det | Noun |
|---|---|---|
| Det | 0.1 | 0.9 |
| Noun | 0.4 | 0.6 |
| Emission P(word|tag) | Det | Noun |
|---|---|---|
| "the" | 0.8 | 0.05 |
| "cat" | 0.01 | 0.9 |
Most likely tagging for "the cat" is:
A. Det → Det
B. Det → Noun
C. Noun → Det
D. Noun → Noun
Explanation:
Solution: Let's solve this step by step using the Hidden Markov Model (HMM)
The goal is to find the most likely sequence of tags for the sentence "the cat" using the Viterbi principle.
Step 1: Understand the tables
Transition probabilities (P(tag₂ | tag₁)):
| From\To | Det | Noun |
|---|---|---|
| Det | 0.1 | 0.9 |
| Noun | 0.4 | 0.6 |
For example, if the previous tag is Det, the probability that the next tag is Noun is 0.9.
Emission probabilities (P(word | tag)):
| Word | Det | Noun |
|---|---|---|
| the | 0.8 | 0.05 |
| cat | 0.01 | 0.9 |
For example, the probability that the word "cat" is emitted by a Noun is 0.9.
Step 2: Compute joint probabilities for all sequences
We consider all possible tag sequences for "the cat":
Det → Det
P(Det → Det) = transition × emission
Step 1 (first word "the" as Det): P("the"|Det) = 0.8
Step 2 (second word "cat" as Det): transition P(Det|Det) = 0.1, emission P("cat"|Det) = 0.01
Total probability = 0.8 × 0.1 × 0.01 = 0.0008
Det → Noun
Step 1 "the" as Det: P("the"|Det) = 0.8
Step 2 "cat" as Noun: transition P(Noun|Det) = 0.9, emission P("cat"|Noun) = 0.9
Total probability = 0.8 × 0.9 × 0.9 = 0.648
Noun → Det
Step 1 "the" as Noun: P("the"|Noun) = 0.05
Step 2 "cat" as Det: transition P(Det|Noun) = 0.4, emission P("cat"|Det) = 0.01
Total probability = 0.05 × 0.4 × 0.01 = 0.0002
Noun → Noun
Step 1 "the" as Noun: P("the"|Noun) = 0.05
Step 2 "cat" as Noun: transition P(Noun|Noun) = 0.6, emission P("cat"|Noun) = 0.9
Total probability = 0.05 × 0.6 × 0.9 = 0.027
Step 3: Compare probabilities
| Sequence | Probability |
|---|---|
| Det → Det | 0.0008 |
| Det → Noun | 0.648 |
| Noun → Det | 0.0002 |
| Noun → Noun | 0.027 |
Step 4: Most likely tagging
The most likely sequence is: Det → Noun (Option B)
"the" is a determiner (Det), and "cat" is a noun (Noun)
| Word → Tag | Verb | Adv |
|---|---|---|
| "quickly" | 0.2 | 0.7 |
If P(Verb)=0.5 and P(Adv)=0.5 initially, probability the word "quickly" is tagged Adv:
A. 0.41
B. 0.55
C. 0.78
D. 0.64
Explanation:
P(Tag,Word) = P(Tag) × P(Word∣Tag)
If "quickly" is tagged as Verb: 0.5×0.2=0.10;
If "quickly" is tagged as Adv: 0.5×0.7=0.35.
Highest is Adv. Hence, normalized Adv = 0.35/(0.10+0.35) = 0.35/0.45 ≈ 0.78.
A. 0.2
B. 0.5
C. 0.83
D. 0.91
Explanation:
P(word|noun) means "Out of all times the word occurs, how many times did it occur with the tag Noun?". We are not doing any smoothing—just using raw counts.
The word appears 10 times as Noun
The same word appears 2 times as Verb
Total appearances of the word = 10 + 2 = 12
Since we want P(word | Noun):
P(word | Noun) = Count(word with Noun) / Total count of the word
Substitute the values
P(word | Noun) = 10 / 12 = 0.8333
Rounded: 0.83
For the tag Noun, the training corpus contains:
- Total noun-tagged word tokens = 50
- Count of "glorf" = 0
- Vocabulary size (unique words) = 10
A. 1/60
B. 1/51
C. 1/61
D. 51/61
Explanation:
Laplace smoothing → (0+1)/(50 + 10) = 1/60.
Understanding the question
We have an unseen word: "glorf"
That means in the training data count("glorf" | Noun) = 0.
We want to compute P("glorf" | Noun) using Add-1 (Laplace) smoothing.
✅ Given
| Total noun tokens | 50 |
| Count of "glorf" under Noun | 0 |
| Vocabulary size (V) | 10 |
Add-1 smoothing formula
P(w | tag) = (count(w, tag) + 1) / (total tokens under tag + V)
Step-by-step calculation
P("glorf" | Noun) = (0 + 1) / (50 + 10) = 1 / 60
A. eat food
B. food eat
Explanation:
"food eat" requires Noun→Verb, which may be low and less natural under English HMM statistics. Because its tag sequence (Noun → Verb) does NOT match the high-probability Verb → Noun transition that the HMM expects.
"eat food" (Verb -> Noun) has HIGH HMM likelihood
"food eat" (Noun -> Verb) has LOW HMM likelihood
| t | word | best tag | prob |
|---|---|---|---|
| 1 | fish | Noun | 0.52 |
| 2 | swim | Verb | 0.46 |
Assume the HMM has a strong Verb → Noun transition (i.e., P(Noun|Verb) is high).
Model predicts next tag likely:
A. Noun
B. Verb
C. Both equal
D. Cannot determine
Explanation:
Since the best tag at t=2 is Verb, the predicted next tag depends mainly on the transition probabilities from Verb. The question explicitly states that Verb → Noun transition is strong. Therefore, the HMM expects the next tag to be Noun with highest probability.
Why the Viterbi algorithm predicts Noun as the next tag
The Viterbi algorithm will predict Noun as the most likely next tag because:
- High transition probability boost:
P(Noun|Verb)is high, which significantly increases the probability of the Noun path. - Natural language patterns: Verbs commonly take noun objects in English (for example, "swim laps", "fish upstream"), so Verb → Noun sequences are frequent.
- Viterbi maximization: The algorithm selects the tag sequence that produces the maximum accumulated probability. With a strong Verb→Noun transition, the Noun path will typically have a higher accumulated probability than alternatives.
The strong transition probability from Verb to Noun makes this the most likely prediction for the next tag in the sequence.
- An adjective is followed by a noun with probability 0.75
- An adjective is followed by another adjective with probability 0.10
Using only these transition probabilities, which 2-word phrase does the HMM consider more likely?
A. beautiful red
B. beautiful flower
| Tag Transition | Probability |
|---|---|
| DT → NN | 0.8 |
| DT → VB | 0.2 |
| Emission | "cat" |
|---|---|
| NN emits "cat" | 0.7 |
| VB emits "cat" | 0.1 |
A. DT
B. NN
C. VB
D. Cannot determine
No comments:
Post a Comment