Showing posts with label HMM. Show all posts
Showing posts with label HMM. Show all posts

Tuesday, December 2, 2025

Master HMM with MCQs – Hidden Markov Model Tagging Explained

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

Hidden Markov Model - MCQs - Problem-based Practice Questions

HMM-Based POS Tagging Practice

These questions explore key aspects of Hidden Markov Model (HMM) based Part-of-Speech (POS) tagging. Some questions explicitly provide prior (initial) probabilities, while others focus only on transition and emission probabilities. You will practice:

  • Calculating posterior probabilities for individual words.
  • Evaluating sequence likelihoods using transitions and emissions.
  • Handling unseen words with smoothing techniques.
  • Determining most likely tag sequences based on high-probability transitions.
1. Consider the following HMM for POS tagging:

Emission ProbabilitiesTransition Probabilities
P(dog | Noun) = 0.6P(next = Noun | current = Noun) = 0.4
P(dog | Verb) = 0.1P(next = Verb | current = Noun) = 0.6
P(runs | Noun) = 0.1P(next = Noun | current = Verb) = 0.5
P(runs | Verb) = 0.7P(next = Verb | current = Verb) = 0.5

In the table, 'next' and 'current' in the probability P(next = Noun | current = Noun) refer to 'POS tag of next word' and 'POS tag of current word' respectively.
Which is the most likely tag sequence for the sentence “dog runs” using the HMM?

A. Noun → Noun
B. Noun → Verb
C. Verb → Noun
D. Verb → Verb

Answer: B
Explanation:

P(Noun→Verb) = 0.6 × 0.6 × 0.7 = 0.252. Highest likelihood = Noun→Verb.

Step-by-Step Probability Computation

We compute the probability for each possible tag sequence using:

P(t₂ | t₁) × P(dog | t₁) × P(runs | t₂)


1. Sequence: Noun → Noun
  • P(dog | Noun) = 0.6
  • P(Noun | Noun) = 0.4
  • P(runs | Noun) = 0.1

0.6 × 0.4 × 0.1 = 0.024


2. Sequence: Noun → Verb
  • P(dog | Noun) = 0.6
  • P(Verb | Noun) = 0.6
  • P(runs | Verb) = 0.7

0.6 × 0.6 × 0.7 = 0.252


3. Sequence: Verb → Noun
  • P(dog | Verb) = 0.1
  • P(Noun | Verb) = 0.5
  • P(runs | Noun) = 0.1

0.1 × 0.5 × 0.1 = 0.005


4. Sequence: Verb → Verb
  • P(dog | Verb) = 0.1
  • P(Verb | Verb) = 0.5
  • P(runs | Verb) = 0.7

0.1 × 0.5 × 0.7 = 0.035

Highest Probability = 0.252

Most likely tag sequence:

B. Noun → Verb

2. For the HMM below:

Initial tag probabilities:
• P(Noun) = 0.7
• P(Adj) = 0.3

Emission probabilities for the word "red":
• P("red" | Noun) = 0.2
• P("red" | Adj) = 0.8

Calculate the normalized probability that 'red' is tagged as an Adjective.

A. 0.14
B. 0.56
C. 0.63
D. 0.24

Answer: C
Explanation:

Compute unnormalized scores:
• Adj = 0.3 × 0.8 = 0.24
• Noun = 0.7 × 0.2 = 0.14

Normalize to get posterior:
Adj = 0.24 / (0.24 + 0.14) ≈ 0.63.

Calculate the probability of a specific tag assignment for a single observed word

In a Hidden Markov Model (HMM), the probability of a specific tag assignment for a single observed word is calculated using the Joint Probability of the tag and the word. This is often referred to as the "Viterbi score" or "path probability" for that specific tag.

The formula for the joint probability of a single state (tag) and observation (word) is:

P(Tag,Word) = P(Tag) × P(Word∣Tag)

Where:

  • P(Tag) is the Initial tag probability (Prior).
  • P(Word∣Tag) is the Emission probability (Likelihood).

Step 1: Calculate the Joint Probabilities for All Tags

For each possible tag, compute the product of the initial probability and emission probability:

For tag = Adjective (Adj):


P(Adj,"red") = P(Adj) × P("red"∣Adj) = 0.3 × 0.8 = 0.24

For tag = Noun:


P(Noun,"red") = P(Noun) × P("red"∣Noun) = 0.7 × 0.2 = 0.14


Step 2: Calculate the Normalizing Constant (Total Probability)

The normalizing constant is the sum of all joint probabilities:

P("red") = P(Adj,"red") + P(Noun,"red") = 0.24 + 0.14 = 0.38


Step 3: Apply Bayes' Theorem to Get the Posterior Probability

Using the normalization formula:

P(Adj∣"red") = P(Adj,"red") / P("red") = 0.24 / 0.38

Simplifying:


P(Adj∣"red") = 0.24 / 0.38 ≈ 0.6316 or approximately 63.16%

Final Answer

The normalized probability that 'red' is tagged as an Adjective is:

P(Adj∣"red") = 0.24 / 0.38 ≈ 0.632 or 63.2%

3. Using the HMM:

TransitionDetNoun
Det0.10.9
Noun0.40.6

Emission P(word|tag)DetNoun
"the"0.80.05
"cat"0.010.9

Most likely tagging for "the cat" is:

A. Det → Det
B. Det → Noun
C. Noun → Det
D. Noun → Noun

Answer: B
Explanation:

Solution: Let's solve this step by step using the Hidden Markov Model (HMM)

The goal is to find the most likely sequence of tags for the sentence "the cat" using the Viterbi principle.

Step 1: Understand the tables

Transition probabilities (P(tag₂ | tag₁)):

From\ToDetNoun
Det0.10.9
Noun0.40.6

For example, if the previous tag is Det, the probability that the next tag is Noun is 0.9.

Emission probabilities (P(word | tag)):

WordDetNoun
the0.80.05
cat0.010.9

For example, the probability that the word "cat" is emitted by a Noun is 0.9.

Step 2: Compute joint probabilities for all sequences

We consider all possible tag sequences for "the cat":

Det → Det

P(Det → Det) = transition × emission
Step 1 (first word "the" as Det): P("the"|Det) = 0.8
Step 2 (second word "cat" as Det): transition P(Det|Det) = 0.1, emission P("cat"|Det) = 0.01

Total probability = 0.8 × 0.1 × 0.01 = 0.0008

Det → Noun

Step 1 "the" as Det: P("the"|Det) = 0.8
Step 2 "cat" as Noun: transition P(Noun|Det) = 0.9, emission P("cat"|Noun) = 0.9

Total probability = 0.8 × 0.9 × 0.9 = 0.648

Noun → Det

Step 1 "the" as Noun: P("the"|Noun) = 0.05
Step 2 "cat" as Det: transition P(Det|Noun) = 0.4, emission P("cat"|Det) = 0.01

Total probability = 0.05 × 0.4 × 0.01 = 0.0002

Noun → Noun

Step 1 "the" as Noun: P("the"|Noun) = 0.05
Step 2 "cat" as Noun: transition P(Noun|Noun) = 0.6, emission P("cat"|Noun) = 0.9

Total probability = 0.05 × 0.6 × 0.9 = 0.027

Step 3: Compare probabilities

SequenceProbability
Det → Det0.0008
Det → Noun0.648
Noun → Det0.0002
Noun → Noun0.027

Step 4: Most likely tagging

The most likely sequence is: Det → Noun (Option B)

"the" is a determiner (Det), and "cat" is a noun (Noun)

4. Given the emission matrix:

Word → TagVerbAdv
"quickly"0.20.7

If P(Verb)=0.5 and P(Adv)=0.5 initially, probability the word "quickly" is tagged Adv:

A. 0.41
B. 0.55
C. 0.78
D. 0.64

Answer: C
Explanation:

P(Tag,Word) = P(Tag) × P(Word∣Tag)

If "quickly" is tagged as Verb: 0.5×0.2=0.10;

If "quickly" is tagged as Adv: 0.5×0.7=0.35.

Highest is Adv. Hence, normalized Adv = 0.35/(0.10+0.35) = 0.35/0.45 ≈ 0.78.

5. A word appears 10 times as Noun and 2 times as Verb in training. Without smoothing P(word|Noun)= ?

A. 0.2
B. 0.5
C. 0.83
D. 0.91

Answer: C
Explanation:

P(word|noun) means "Out of all times the word occurs, how many times did it occur with the tag Noun?". We are not doing any smoothing—just using raw counts.

The word appears 10 times as Noun

The same word appears 2 times as Verb

Total appearances of the word = 10 + 2 = 12


Since we want P(word | Noun):

P(word | Noun) = Count(word with Noun) / Total count of the word

Substitute the values

P(word | Noun) = 10 / 12 = 0.8333

Rounded: 0.83

6. In an HMM POS-tagger, we want to estimate the emission probability of an unseen word. Consider the word "glorf", which never occurred in the training data.
For the tag Noun, the training corpus contains:
  • Total noun-tagged word tokens = 50
  • Count of "glorf" = 0
  • Vocabulary size (unique words) = 10
Using Add-1 (Laplace) smoothing, compute 𝑃("glorf" ∣ Noun).

A. 1/60
B. 1/51
C. 1/61
D. 51/61

Answer: A
Explanation:

Laplace smoothing → (0+1)/(50 + 10) = 1/60.

Understanding the question

We have an unseen word: "glorf"
That means in the training data count("glorf" | Noun) = 0.
We want to compute P("glorf" | Noun) using Add-1 (Laplace) smoothing.

✅ Given

Total noun tokens 50
Count of "glorf" under Noun 0
Vocabulary size (V) 10

Add-1 smoothing formula

P(w | tag) = (count(w, tag) + 1) / (total tokens under tag + V)


Step-by-step calculation

P("glorf" | Noun) = (0 + 1) / (50 + 10) = 1 / 60

7. Which sentence has lower HMM likelihood given high Verb→Noun transition?

A. eat food
B. food eat

Answer: B
Explanation:

"food eat" requires Noun→Verb, which may be low and less natural under English HMM statistics. Because its tag sequence (Noun → Verb) does NOT match the high-probability Verb → Noun transition that the HMM expects.

"eat food" (Verb -> Noun) has HIGH HMM likelihood


"food eat" (Noun -> Verb) has LOW HMM likelihood

8. Given partial Viterbi table:

twordbest tagprob
1fishNoun0.52
2swimVerb0.46

Assume the HMM has a strong Verb → Noun transition (i.e., P(Noun|Verb) is high).

Model predicts next tag likely:

A. Noun
B. Verb
C. Both equal
D. Cannot determine

Answer: A
Explanation:

Since the best tag at t=2 is Verb, the predicted next tag depends mainly on the transition probabilities from Verb. The question explicitly states that Verb → Noun transition is strong. Therefore, the HMM expects the next tag to be Noun with highest probability.

Why the Viterbi algorithm predicts Noun as the next tag

The Viterbi algorithm will predict Noun as the most likely next tag because:

  • High transition probability boost: P(Noun|Verb) is high, which significantly increases the probability of the Noun path.
  • Natural language patterns: Verbs commonly take noun objects in English (for example, "swim laps", "fish upstream"), so Verb → Noun sequences are frequent.
  • Viterbi maximization: The algorithm selects the tag sequence that produces the maximum accumulated probability. With a strong Verb→Noun transition, the Noun path will typically have a higher accumulated probability than alternatives.

The strong transition probability from Verb to Noun makes this the most likely prediction for the next tag in the sequence.

9. In an HMM for POS tagging, you are given the following transition probabilities for adjectives:

  • An adjective is followed by a noun with probability 0.75
  • An adjective is followed by another adjective with probability 0.10
These probabilities tell us which tags usually come after an adjective in the training data.

Using only these transition probabilities, which 2-word phrase does the HMM consider more likely?

A. beautiful red
B. beautiful flower

10. In an HMM POS tagger, you observe the single word "cat". The model gives you the following probabilities:

Tag TransitionProbability
DT → NN0.8
DT → VB0.2
Emission"cat"
NN emits "cat"0.7
VB emits "cat"0.1
For this one-word sentence, the tag is chosen mainly based on the emission probability of the word. Based on these values, which tag is the HMM most likely to assign to the word "cat"?

A. DT
B. NN
C. VB
D. Cannot determine

Monday, December 1, 2025

HMM-Based POS Tagging MCQs | Viterbi, Emission & Transition Explained

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

1. In an HMM-based POS tagger, hidden states typically represent:

A. Words in the text
B. POS tags
C. Syntactic chunks
D. Sentence categories

Answer: B
Explanation:

In POS tagging using Hidden Markov Models, hidden states correspond to POS tags like NN, VB, DT, etc., while observations are the actual words.

________________________________________
2. The emission probability in an HMM for POS tagging represents:

A. P(tag | word)
B. P(word | tag)
C. P(tag | sentence length)
D. P(sentence | tags)

Answer: B
Explanation:

Emission probability in HMM defines the likelihood of generating (emitting) a word from a particular POS tag: P(word | tag).

What is emission probability?

In the context of Hidden Markov Models (HMMs), the emission probability refers to the likelihood of observing a particular output (observation) from a given hidden state.

HMM Components

An HMM consists of:

  • Hidden states (S) (These are not directly observable), Observations (O) (These are visible outputs generated by the hidden states),Transition probabilities (A) (Probability of moving from one hidden state to another), and Emission probabilities (B) (Probability of a hidden state generating a particular observation).


Emission Probability Formula

If s is a hidden state and o is an observation:

Emission probability = P(observation | state) = P(o | s)

In POS tagging, this is the probability that a tag emits a particular word.

Example: P("dog" | NN) = 0.005

This means that the word "dog" is generated by the NN (noun) tag with probability 0.005.

Intuition

  • Hidden state: “POS tag of the current word”
  • Observation: “Actual word in the sentence”

Emission probability answers:

“Given that the current word has tag NN, how likely is it to be this particular word?”

________________________________________
3. Transition probabilities in POS HMM tagging capture:

A. Probability of a word given tag
B. Probability of current tag given previous tag
C. Probability of unknown word generation
D. Probability of sentence boundary

Answer: B
Explanation:

Transition probability expresses tag–tag dependency e.g., P(NN | DT) — more likely because determiners commonly precede nouns.

What is transition probability in HMM context?

In Hidden Markov Models (HMMs), the transition probability represents the likelihood of moving from one hidden state to another in a sequence.

Formal Definition

If st is the current hidden state and st-1 is the previous hidden state, then:

Transition Probability = P(sₜ | sₜ₋₁)
  

It answers the question:

Given the previous hidden state, what is the probability of transitioning to the next state?

Where It Applies

In tasks like Part-of-Speech (POS) tagging:

  • Hidden states = POS tags (NN, VB, DT, JJ...)
  • Transition probability models how likely one tag follows another

Example

P(NN | DT) = 0.65
  

Meaning: If the previous tag is DT (determiner), there is a 65% chance the next tag is NN (noun) (common phrase pattern: the cat, a dog, this book).

________________________________________
4. The algorithm used to find the most probable tag sequence in POS HMM is:

A. Forward algorithm
B. CYK algorithm
C. Viterbi decoding
D. Naive Bayes

Answer: C
Explanation:

Viterbi is a dynamic programming algorithm used for optimal decoding — finding the best tag sequence for a sentence.

Viterbi algorithm

The Viterbi algorithm is a dynamic‑programming method that finds the most probable sequence of hidden states (a path) that could have produced a given observation sequence in a Hidden Markov Model (HMM).

________________________________________
5. In HMM POS tagging, unknown words are usually handled using:

A. Ignoring them during tagging
B. Assigning probability zero
C. Smoothing or suffix-based rules
D. Removing them from corpus

Answer: C
Explanation:

Unknown/rare words are tackled using morphological heuristics, smoothing (Laplace, Good-Turing) or suffix-based tagging methods.

What is smoothing and why is it needed?

In HMM POS tagging, we rely on:

  • Transition probabilities → P(tagt | tagt-1)
  • Emission probabilities → P(word | tag)

If a word never appeared in training data, its emission probability becomes:

P(word | tag) = 0

This is a problem because one zero probability makes the entire sentence probability zero, causing the Viterbi decoding to fail.


What Smoothing Does

Smoothing reassigns small probability to unseen words/events instead of zero. It ensures the model can still tag new sentences even with unknown words.

________________________________________
6. If an HMM uses T tags and vocabulary size V, emission matrix dimension is:

A. V × V
B. T × V
C. T × T
D. 1 × V

Answer: B
Explanation:

Every tag generates any word — hence matrix = #Tags × #Words.

________________________________________
7. A bigram POS HMM assumes:

A. Tag depends on all previous tags
B. Tag depends only on previous tag
C. Word and tag are independent
D. Tags follow uniform probability

Answer: B
Explanation:

Markov assumption → P(tᵢ | tᵢ₋₁), not dependent on entire tag history.

________________________________________
8. The Baum-Welch algorithm trains POS HMM using:

A. Gradient descent
B. Evolutionary optimization
C. Expectation–Maximization (EM)
D. Manual rules

Answer: C
Explanation:

Baum-Welch is an unsupervised EM algorithm re-estimating transition + emission probabilities.

________________________________________
9. Viterbi differs from Forward algorithm because it:

A. Sums probabilities of all paths
B. Chooses the maximum probability path
C. Works only for continuous observations
D. Does not use dynamic programming

Answer: B
Explanation:

Forward algorithm sums over paths. Viterbi picks best single path (max probability).

________________________________________
10. HMM POS tagging suffers most when:

A. Vocabulary is large
B. Words are highly ambiguous
C. Text is short
D. Emission is continuous

Answer: B
Explanation:

Ambiguous words like bank, can, light require context HMM cannot model deeply.

Why does HMM suffer ambiguous words?

HMMs are probabilistic sequence models based on transition and emission probabilities, so when words are highly ambiguous, the model struggles because multiple POS tags have similar probabilities for the same word.

HMM suffers with highly ambiguous words because it relies only on emission and transition probabilities, so when multiple tags are equally likely for the same word, the model becomes uncertain and may choose the wrong POS tag.

Monday, November 24, 2025

Top 10 Hidden Markov Model (HMM) MCQs with Answers and Explanations (2025 Updated)


1.
In a Hidden Markov Model, which component determines how likely an observation is generated from a hidden state?

A. Transition probability
B. Initial state probability
C. Emission probability
D. Posterior probability

Answer: C
Explanation:

Emission probabilities define how observations are generated from hidden states, making them critical in mapping hidden behavior to visible outputs.

What is emission probability in HMM?

Emission probability (also called output probability) in a Hidden Markov Model represents the likelihood of observing a particular symbol or observation given that the model is in a specific hidden state at a particular time step.

In an HMM, you have two types of events happening simultaneously: hidden states that are not directly observable and observations (emissions) that are visible. The emission probability defines the relationship between these hidden states and what we actually observe.

Example: Please refer here
2. The Viterbi algorithm is used in HMMs primarily to compute:

A. The likelihood of the observation sequence
B. The most probable hidden state sequence
C. Transition matrix normalization
D. The number of emission symbols

Answer: B
Explanation:

The Viterbi algorithm finds the single most probable sequence of hidden states that could have produced the given observations.

Viterbi algorithm

The Viterbi algorithm is a dynamic programming algorithm that finds the most likely sequence of hidden states that would explain a sequence of observed events in a Hidden Markov Model. It solves the decoding problem in HMMs: given observations and the HMM model, what sequence of hidden states most likely produced those observations?

When would you need Viterbi algorithm?

You need the Viterbi algorithm whenever you have a decoding problem in a Hidden Markov Model—that is, when you need to infer the most likely sequence of hidden states from a sequence of observations. More specifically, the algorithm is essential when you face problems where hidden states influence observable data, but you only have access to the observations and need to determine what the hidden states were. Some example cases as follows;

  • When you need the single most likely state sequence (e.g., transcribing a spoken word), Viterbi gives the exact MAP (maximum a‑posteriori) path.
  • When the number of states is modest (tens to a few hundred). Runtime O(N²T) is usually fine.
Refer here for Why viterbi is efficient for NLP tasks?
3. Baum-Welch learning algorithm in HMMs is best described as:

A. A supervised algorithm for labeled sequences
B. A greedy optimization algorithm
C. An unsupervised EM-based algorithm for parameter estimation
D. A rule-based decoding algorithm

Answer: C
Explanation:

Baum-Welch is an Expectation–Maximization algorithm that updates transition and emission probabilities based on unlabeled data.

Baum-Welch algorithm (forward-backword algorithm)

The Baum-Welch algorithm is a machine learning algorithm used to solve the learning problem in Hidden Markov Models—estimating the unknown parameters of an HMM from observed data. It is also known as the forward-backward algorithm and is a special case of the Expectation-Maximization (EM) algorithm.

It is a method used to train a Hidden Markov Model (HMM) when you don’t know the correct state sequence in your data.

How does Baum-Welch algorith work?

It uses a two-step repeating process called EM (Expectation–Maximization):
  • Expectation Step (E-step): The algorithm guesses the hidden state sequence based on current model parameters. In this step, it uses both forward and backward algorithms.

  • Maximization Step (M-step): Based on that guess, the algorithm updates the model parameters to better fit the data.

Then it repeats these two steps over and over until things stop changing much (convergence).
4. If an HMM has 4 hidden states and 6 observation symbols, the size of the emission matrix is:

A. 6 × 6
B. 1 × 6
C. 4 × 6
D. 6 × 4

Answer: C
Explanation:

Each state must assign probabilities to all observation symbols, so the matrix is defined as: number of states × number of symbols.

Explanation: In a Hidden Markov Model (HMM), the emission matrix (also called the observation probability matrix) represents the probability of emitting each observation symbol from each hidden state.

So its size depends on: Number of hidden states (N) → here: 4 Number of observation symbols (M) → here: 6 Therefore: Emission Matrix Size = 𝑁 × 𝑀 = 4 × 6
5. In a standard Hidden Markov Model (HMM), it is assumed that the next state depends only on the current state. If we remove this assumption and allow the next state to depend on multiple previous states, what would the model require?

A. Removing hidden states
B. Modeling higher-order dependencies between previous states
C. Using equal (uniform) probabilities for all transitions
D. Allowing continuous observations only

Answer: B
Explanation:

The Markov assumption states that a state depends only on the previous state. If violated, the model must incorporate higher-order context. That means, Higher-order HMMs (Second-order, Third-order, etc.), where transitions depend on multiple past states, not just one.

Mathematically:

P(qtqt1,qt2,...)=P(qtqt1)P(q_t | q_{t-1}, q_{t-2}, ... ) = P(q_t | q_{t-1})

If we violate this assumption, it means the model must consider more than one previous state — meaning:

P(qt) depends on qt1,qt2,P(q_t) \text{ depends on } q_{t-1}, q_{t-2}, \dots


6. The forward algorithm computes observation probability using:

A. Maximization over paths
B. Linear rule-based selection
C. Random sampling of hidden sequences
D. Summation over possible hidden paths

Answer: D
Explanation:

The forward algorithm does not find the best path — instead, it computes the total probability of observing the sequence by summing over all possible hidden state paths.

Forward algorithm in HMM

The Forward Algorithm in a Hidden Markov Model (HMM) is a dynamic programming method used to compute the probability of an observation sequence, given the model parameters.

In simple words: It tells us how likely a given sequence of observations is, according to the HMM.

We need it because if an observation sequence has length T and the HMM has N hidden states, then there are:

NTN^T

possible hidden-state paths that could produce the observations — too many to compute manually. The forward algorithm solves this efficiently by reusing intermediate results instead of recalculating everything.

7. A strong sign of overfitting in an HMM model is:

A. Uniform state transition probabilities
B. High accuracy on training data but poor performance on new data
C. Low number of hidden states
D. Use of discrete emission probabilities only

Answer: B
Explanation:

Overfitting occurs when an HMM learns noise and memorizes transitions instead of generalizing sequence structure.

HMM can overfit?

An HMM becomes overfitted when it learns the training sequences too specifically, rather than learning general patterns. This often happens when:

  • The model has too many hidden states

  • The emission/transition probabilities become too precise for the training data

  • The dataset is small, but the model is complex

  • The parameters are estimated without regularization

In such cases, the HMM starts modeling noise or rare patterns in the training data, instead of meaningful structure.

8. The key difference between Forward and Viterbi algorithm is:

A. Forward sums probabilities, Viterbi finds maximum sums path
B. Forward maximizes likelihood, Viterbi sums paths
C. Forward ignores emissions, Viterbi uses emissions
D. Forward is supervised, Viterbi is unsupervised

Answer: A
Explanation:

The forward algorithm computes total likelihood using summation, while Viterbi finds the best hidden sequence using maximization.

Difference between Forward algorithm and Viterbi algorithm in HMM

The Forward Algorithm and Viterbi Algorithm are two fundamental dynamic programming techniques used in Hidden Markov Models, but they solve different problems and employ different mathematical operations.

The Forward Algorithm 

  • computes the probability of observing a sequence, considering all possible hidden state paths that could have generated that sequence. It answers the question: "What is the likelihood of seeing this observation sequence?"
  • used for evaluation problem in HMM.
  • uses summation.
  • Analogy: Given all possible routes to reach a city B from city A, forward algorithm answers "What is the total chance of reaching a city using any route?"

The Viterbi Algorithm, by contrast, 

  • finds the single most probable hidden state sequence that could have generated the observations. It answers: "What is the best sequence of hidden states that explains these observations?" 
  • used for decoding problem in HMM.
  • uses maximization.
  • Analogy: Given all possible routes to reach a city B from city A, Viterbi algorithm answers "Which single route is the most likely/best?"

9. Continuous observation models like Gaussian Mixtures are preferred in speech HMMs because:

A. They eliminate decoding steps
B. Speech signals are continuous-valued
C. They simplify transition probability computation
D. They require no training data

Answer: B
Explanation:

Speech data consists of real-valued acoustic features, making continuous modeling more natural than discrete symbol assignments.

10. In HMM smoothing, the probability of being in state S at time t given observed data and model parameters is called:

A. Prior probability
B. Posterior state probability
C. Forward likelihood
D. Emission certainty factor

Answer: B
Explanation:

Posterior state probability represents confidence in each state at a specific time, calculated using the Forward-Backward algorithm.

Smoothing in HMMs means estimating the probability of hidden states using all observations — past, present, and future — to make the most accurate prediction.

Example:

Smoothing means deciding what part-of-speech a word most likely is, using the entire sentence — not just the words before it.

Visit here for more information about Hidden Markov Model (HMM)

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents