🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.

Key Concepts Illustrated in the Figure

  1. Visible states (Observations)
    Visible states are the observed outputs of an HMM, such as words in a sentence. In the above figure, 'cat', 'purrs', etc are observations.
  2. Hidden states
    Hidden states are the unobserved underlying states (e.g., POS tags - 'DT', 'N', etc in the figure) that generate the visible observations.
  3. Transition probabilities
    Transition probabilities define the likelihood of moving from one hidden state to another. In the figure, this is represented by the arrows from one POS tag to the other. Example: P(N -> V) or P(V | N).
  4. Emission probabilities
    Emission probabilities define the likelihood of a visible observation being generated by a hidden state. In the figure, this is represented by the arrows from POS tags to words. Example: P(cat | N).
  5. POS tagging using HMM
    POS tagging using HMM models tags as hidden states and words as observations to find the most probable tag sequence.
  6. Evaluation problem
    The evaluation problem computes the probability of an observation sequence given an HMM.
  7. Forward algorithm
    The forward algorithm efficiently solves the evaluation problem using dynamic programming.
  8. Decoding problem
    The decoding problem finds the most probable hidden state sequence for a given observation sequence.
1. In POS tagging using HMM, the hidden states represent:






Correct Answer: B

In HMM-based POS tagging, tags are hidden states and words are observed symbols.

2. The most suitable algorithm for decoding the best POS sequence in HMM tagging is:






Correct Answer: D

Viterbi decoding finds the most probable hidden tag sequence.

3. Transition probabilities in HMM POS tagging define:






Correct Answer: B

Transition probability models tag-to-tag dependency. That is, the probability of a tag t given another tag t-1 which is previous tag. It is calculated using Maximum Likelihood Estimation (MLE) as follows;

Maximum Likelihood Estimation (MLE)

When the state sequence is known (for example, in POS tagging with labeled training data), the transition probability is estimated using Maximum Likelihood Estimation.

aij = Count(ti → tj) / Count(ti)

Where:

  • Count(ti → tj) is the number of times a POS tag ti is immediately followed by a POS tag tj in the training data.
  • Count(ti) is the total number of appearence of tag ti in the entire training data.

This estimation ensures that the transition probabilities for each state sum to 1.

For example, the transition probability P(Noun | Det) will be 6/10 or 0.6 if in the training corpus the tag sequence "Det Noun" (Eg. like in "The/Det cat/Noun" - this is called tagged training data) occurs 6 times and the tag "Det" alone appears 10 times overall.

4. Emission probability in POS tagging refers to:






Correct Answer: C

Emission probability is P(word | tag).

It answer the question "Given a particular POS tag, how likely is it that this tag generates (emits) a specific word?"

Emission probability calculation: Out of the total number of times a tag appears in the training data (eg. NOUN), how many times it appears as the tag of a given word (eg. "cat/NOUN").

5. Which problem does Baum–Welch training solve in HMM POS tagging?






Correct Answer: C

Baum–Welch (EM) learns transition and emission probabilities without labeled data.

Baum–Welch Method

The Baum–Welch method is an algorithm used to train a Hidden Markov Model (HMM) when the true state (tag) sequence is unknown.

What does the Baum–Welch method do?

It estimates (learns) the transition and emission probabilities of an HMM from unlabeled data.

In Simple Terms

  • You are given only observation sequences (e.g., words)
  • You do not know the hidden state sequence (e.g., POS tags)
  • Baum–Welch automatically learns the model parameters that best explain the data

The Baum–Welch method is used to train an HMM by estimating transition and emission probabilities from unlabeled observation sequences using EM. Baum-Welch is a special case of Expectation-Maximization (EM) algorithm.

6. If an HMM POS tagger has 50 tags and a 20,000-word vocabulary, the emission matrix size is:






Correct Answer: B

Rows correspond to tags and columns to words.

In an HMM POS tagger, the emission matrix represents:

P(word | tag)

So its dimensions are:

  • Rows = number of tags
  • Columns = vocabulary size

Given:

  • Number of tags = 50
  • Vocabulary size = 20,000

Emission matrix size:

50 × 20,000

7. A trigram HMM POS tagger models:






Correct Answer: B

Trigram models capture dependency on two previous tags.

Trigram Model

A trigram model assumes that the probability of a tag (or word) depends on the previous two tags.

P(ti | ti−1, ti−2)

In POS tagging using an HMM:

  • Transition probabilities are computed using trigrams of tags
  • The model captures more context than unigram or bigram models

Example:

If the previous two tags are DT and NN, the probability of the next tag VB is:

P(VB | DT, NN)

Note: In practice, smoothing and backoff are used because many trigrams are unseen.

8. Data sparsity in emission probabilities mostly occurs due to:






Correct Answer: B

Unseen words lead to zero emission probabilities without smoothing.

Data sparsity in emission probabilities means that many valid word–tag combinations were never seen during training, so their probabilities are zero or unreliable.

Data sparsity may occur due to one or more of the following;
  • Natural language has a very large vocabulary.
  • Training data is finite.
  • New or rare words often appear during test time.
As a result, many words in the test data were never observed with any tag during training.
9. A common solution for unknown words in HMM POS tagging is:






Correct Answer: B

Smoothing assigns non-zero probabilities to unseen events.


Refer here for more information about Laplace smoothing.
10. POS tagging is considered a:






Correct Answer: C

Each token is labeled sequentially → classic sequence labeling.

POS Tagging as a Sequence Labeling Task

POS tagging is a sequence labeling task because the goal is to assign a label (POS tag) to each element in a sequence (words in a sentence) while considering their order and context.


What is Sequence Labeling?

In sequence labeling, we:

  • Take an input sequence: w1, w2, …, wn
  • Produce an output label sequence: t1, t2, …, tn

Each input item receives one corresponding label, and the labels are not independent of each other.


POS Tagging as Sequence Labeling

Input sequence → words in a sentence

The / cat / sleeps

Output sequence → POS tags

DT / NN / VBZ

Each word must receive exactly one POS tag, and the choice of tag depends on:

  • The current word (emission probability)
  • The neighboring tags (context / transition probability)