1. In an HMM-based POS tagger, hidden states typically represent:

A. Words in the text
B. POS tags
C. Syntactic chunks
D. Sentence categories

Answer: B
Explanation:

In POS tagging using Hidden Markov Models, hidden states correspond to POS tags like NN, VB, DT, etc., while observations are the actual words.

________________________________________
2. The emission probability in an HMM for POS tagging represents:

A. P(tag | word)
B. P(word | tag)
C. P(tag | sentence length)
D. P(sentence | tags)

Answer: B
Explanation:

Emission probability in HMM defines the likelihood of generating (emitting) a word from a particular POS tag: P(word | tag).

What is emission probability?

In the context of Hidden Markov Models (HMMs), the emission probability refers to the likelihood of observing a particular output (observation) from a given hidden state.

HMM Components

An HMM consists of:

  • Hidden states (S) (These are not directly observable), Observations (O) (These are visible outputs generated by the hidden states),Transition probabilities (A) (Probability of moving from one hidden state to another), and Emission probabilities (B) (Probability of a hidden state generating a particular observation).


Emission Probability Formula

If s is a hidden state and o is an observation:

Emission probability = P(observation | state) = P(o | s)

In POS tagging, this is the probability that a tag emits a particular word.

Example: P("dog" | NN) = 0.005

This means that the word "dog" is generated by the NN (noun) tag with probability 0.005.

Intuition

  • Hidden state: “POS tag of the current word”
  • Observation: “Actual word in the sentence”

Emission probability answers:

“Given that the current word has tag NN, how likely is it to be this particular word?”

________________________________________
3. Transition probabilities in POS HMM tagging capture:

A. Probability of a word given tag
B. Probability of current tag given previous tag
C. Probability of unknown word generation
D. Probability of sentence boundary

Answer: B
Explanation:

Transition probability expresses tag–tag dependency e.g., P(NN | DT) — more likely because determiners commonly precede nouns.

What is transition probability in HMM context?

In Hidden Markov Models (HMMs), the transition probability represents the likelihood of moving from one hidden state to another in a sequence.

Formal Definition

If st is the current hidden state and st-1 is the previous hidden state, then:

Transition Probability = P(sₜ | sₜ₋₁)
  

It answers the question:

Given the previous hidden state, what is the probability of transitioning to the next state?

Where It Applies

In tasks like Part-of-Speech (POS) tagging:

  • Hidden states = POS tags (NN, VB, DT, JJ...)
  • Transition probability models how likely one tag follows another

Example

P(NN | DT) = 0.65
  

Meaning: If the previous tag is DT (determiner), there is a 65% chance the next tag is NN (noun) (common phrase pattern: the cat, a dog, this book).

________________________________________
4. The algorithm used to find the most probable tag sequence in POS HMM is:

A. Forward algorithm
B. CYK algorithm
C. Viterbi decoding
D. Naive Bayes

Answer: C
Explanation:

Viterbi is a dynamic programming algorithm used for optimal decoding — finding the best tag sequence for a sentence.

Viterbi algorithm

The Viterbi algorithm is a dynamic‑programming method that finds the most probable sequence of hidden states (a path) that could have produced a given observation sequence in a Hidden Markov Model (HMM).

________________________________________
5. In HMM POS tagging, unknown words are usually handled using:

A. Ignoring them during tagging
B. Assigning probability zero
C. Smoothing or suffix-based rules
D. Removing them from corpus

Answer: C
Explanation:

Unknown/rare words are tackled using morphological heuristics, smoothing (Laplace, Good-Turing) or suffix-based tagging methods.

What is smoothing and why is it needed?

In HMM POS tagging, we rely on:

  • Transition probabilities → P(tagt | tagt-1)
  • Emission probabilities → P(word | tag)

If a word never appeared in training data, its emission probability becomes:

P(word | tag) = 0

This is a problem because one zero probability makes the entire sentence probability zero, causing the Viterbi decoding to fail.


What Smoothing Does

Smoothing reassigns small probability to unseen words/events instead of zero. It ensures the model can still tag new sentences even with unknown words.

________________________________________
6. If an HMM uses T tags and vocabulary size V, emission matrix dimension is:

A. V × V
B. T × V
C. T × T
D. 1 × V

Answer: B
Explanation:

Every tag generates any word — hence matrix = #Tags × #Words.

________________________________________
7. A bigram POS HMM assumes:

A. Tag depends on all previous tags
B. Tag depends only on previous tag
C. Word and tag are independent
D. Tags follow uniform probability

Answer: B
Explanation:

Markov assumption → P(táµ¢ | táµ¢₋₁), not dependent on entire tag history.

________________________________________
8. The Baum-Welch algorithm trains POS HMM using:

A. Gradient descent
B. Evolutionary optimization
C. Expectation–Maximization (EM)
D. Manual rules

Answer: C
Explanation:

Baum-Welch is an unsupervised EM algorithm re-estimating transition + emission probabilities.

________________________________________
9. Viterbi differs from Forward algorithm because it:

A. Sums probabilities of all paths
B. Chooses the maximum probability path
C. Works only for continuous observations
D. Does not use dynamic programming

Answer: B
Explanation:

Forward algorithm sums over paths. Viterbi picks best single path (max probability).

________________________________________
10. HMM POS tagging suffers most when:

A. Vocabulary is large
B. Words are highly ambiguous
C. Text is short
D. Emission is continuous

Answer: B
Explanation:

Ambiguous words like bank, can, light require context HMM cannot model deeply.

Why does HMM suffer ambiguous words?

HMMs are probabilistic sequence models based on transition and emission probabilities, so when words are highly ambiguous, the model struggles because multiple POS tags have similar probabilities for the same word.

HMM suffers with highly ambiguous words because it relies only on emission and transition probabilities, so when multiple tags are equally likely for the same word, the model becomes uncertain and may choose the wrong POS tag.