Showing posts with label NLP Quiz Questions. Show all posts
Showing posts with label NLP Quiz Questions. Show all posts

Wednesday, December 17, 2025

Hidden Markov Model (HMM) – MCQs, Notes & Practice Questions | ExploreDatabase

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.

Key Concepts Illustrated in the Figure

  1. Visible states (Observations)
    Visible states are the observed outputs of an HMM, such as words in a sentence. In the above figure, 'cat', 'purrs', etc are observations.
  2. Hidden states
    Hidden states are the unobserved underlying states (e.g., POS tags - 'DT', 'N', etc in the figure) that generate the visible observations.
  3. Transition probabilities
    Transition probabilities define the likelihood of moving from one hidden state to another. In the figure, this is represented by the arrows from one POS tag to the other. Example: P(N -> V) or P(V | N).
  4. Emission probabilities
    Emission probabilities define the likelihood of a visible observation being generated by a hidden state. In the figure, this is represented by the arrows from POS tags to words. Example: P(cat | N).
  5. POS tagging using HMM
    POS tagging using HMM models tags as hidden states and words as observations to find the most probable tag sequence.
  6. Evaluation problem
    The evaluation problem computes the probability of an observation sequence given an HMM.
  7. Forward algorithm
    The forward algorithm efficiently solves the evaluation problem using dynamic programming.
  8. Decoding problem
    The decoding problem finds the most probable hidden state sequence for a given observation sequence.
1. In POS tagging using HMM, the hidden states represent:






Correct Answer: B

In HMM-based POS tagging, tags are hidden states and words are observed symbols.

2. The most suitable algorithm for decoding the best POS sequence in HMM tagging is:






Correct Answer: D

Viterbi decoding finds the most probable hidden tag sequence.

3. Transition probabilities in HMM POS tagging define:






Correct Answer: B

Transition probability models tag-to-tag dependency. That is, the probability of a tag t given another tag t-1 which is previous tag. It is calculated using Maximum Likelihood Estimation (MLE) as follows;

Maximum Likelihood Estimation (MLE)

When the state sequence is known (for example, in POS tagging with labeled training data), the transition probability is estimated using Maximum Likelihood Estimation.

aij = Count(ti → tj) / Count(ti)

Where:

  • Count(ti → tj) is the number of times a POS tag ti is immediately followed by a POS tag tj in the training data.
  • Count(ti) is the total number of appearence of tag ti in the entire training data.

This estimation ensures that the transition probabilities for each state sum to 1.

For example, the transition probability P(Noun | Det) will be 6/10 or 0.6 if in the training corpus the tag sequence "Det Noun" (Eg. like in "The/Det cat/Noun" - this is called tagged training data) occurs 6 times and the tag "Det" alone appears 10 times overall.

4. Emission probability in POS tagging refers to:






Correct Answer: C

Emission probability is P(word | tag).

It answer the question "Given a particular POS tag, how likely is it that this tag generates (emits) a specific word?"

Emission probability calculation: Out of the total number of times a tag appears in the training data (eg. NOUN), how many times it appears as the tag of a given word (eg. "cat/NOUN").

5. Which problem does Baum–Welch training solve in HMM POS tagging?






Correct Answer: C

Baum–Welch (EM) learns transition and emission probabilities without labeled data.

Baum–Welch Method

The Baum–Welch method is an algorithm used to train a Hidden Markov Model (HMM) when the true state (tag) sequence is unknown.

What does the Baum–Welch method do?

It estimates (learns) the transition and emission probabilities of an HMM from unlabeled data.

In Simple Terms

  • You are given only observation sequences (e.g., words)
  • You do not know the hidden state sequence (e.g., POS tags)
  • Baum–Welch automatically learns the model parameters that best explain the data

The Baum–Welch method is used to train an HMM by estimating transition and emission probabilities from unlabeled observation sequences using EM. Baum-Welch is a special case of Expectation-Maximization (EM) algorithm.

6. If an HMM POS tagger has 50 tags and a 20,000-word vocabulary, the emission matrix size is:






Correct Answer: B

Rows correspond to tags and columns to words.

In an HMM POS tagger, the emission matrix represents:

P(word | tag)

So its dimensions are:

  • Rows = number of tags
  • Columns = vocabulary size

Given:

  • Number of tags = 50
  • Vocabulary size = 20,000

Emission matrix size:

50 × 20,000

7. A trigram HMM POS tagger models:






Correct Answer: B

Trigram models capture dependency on two previous tags.

Trigram Model

A trigram model assumes that the probability of a tag (or word) depends on the previous two tags.

P(ti | ti−1, ti−2)

In POS tagging using an HMM:

  • Transition probabilities are computed using trigrams of tags
  • The model captures more context than unigram or bigram models

Example:

If the previous two tags are DT and NN, the probability of the next tag VB is:

P(VB | DT, NN)

Note: In practice, smoothing and backoff are used because many trigrams are unseen.

8. Data sparsity in emission probabilities mostly occurs due to:






Correct Answer: B

Unseen words lead to zero emission probabilities without smoothing.

Data sparsity in emission probabilities means that many valid word–tag combinations were never seen during training, so their probabilities are zero or unreliable.

Data sparsity may occur due to one or more of the following;
  • Natural language has a very large vocabulary.
  • Training data is finite.
  • New or rare words often appear during test time.
As a result, many words in the test data were never observed with any tag during training.
9. A common solution for unknown words in HMM POS tagging is:






Correct Answer: B

Smoothing assigns non-zero probabilities to unseen events.


Refer here for more information about Laplace smoothing.
10. POS tagging is considered a:






Correct Answer: C

Each token is labeled sequentially → classic sequence labeling.

POS Tagging as a Sequence Labeling Task

POS tagging is a sequence labeling task because the goal is to assign a label (POS tag) to each element in a sequence (words in a sentence) while considering their order and context.


What is Sequence Labeling?

In sequence labeling, we:

  • Take an input sequence: w1, w2, …, wn
  • Produce an output label sequence: t1, t2, …, tn

Each input item receives one corresponding label, and the labels are not independent of each other.


POS Tagging as Sequence Labeling

Input sequence → words in a sentence

The / cat / sleeps

Output sequence → POS tags

DT / NN / VBZ

Each word must receive exactly one POS tag, and the choice of tag depends on:

  • The current word (emission probability)
  • The neighboring tags (context / transition probability)

Monday, December 15, 2025

POS Tagging using HMM Solved exercises 02 – Multiple Choice Questions (MCQs) with Answers

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.

Important Definitions

Bigram Model

A bigram model is a probabilistic model that assumes each element (such as a POS tag) depends only on the immediately preceding element. In POS tagging, it is based on the first-order Markov assumption.

Mathematically:

P(ti | t1, …, ti−1) ≈ P(ti | ti−1)

Transition Probability

A transition probability is the probability of one POS tag following another POS tag in a sequence.

Mathematically:

P(tj | ti)

It indicates how likely a tag tj occurs after tag ti.

Emission Probability

An emission probability is the probability that a given POS tag generates (emits) a specific word.

Mathematically:

P(w | t)

It represents how likely a word w is produced by a POS tag t.

11. Using the POS-tag HMM below, what is the most likely tag for the word "apple"?

Tags: Noun (N), Verb (V)
Emission Probabilities:
WordP(word|N)P(word|V)
eat0.050.60
apple0.700.10
Assume equal prior tag probability.

Answer: A
Explanation: P(apple|N)=0.70 ≫ P(apple|V)=0.10. With equal priors, **Noun** wins.
12. In the following transition matrix, what is the probability of the POS sequence N → V → N?
From \ ToNV
N0.300.70
V0.500.50
Initial probability P(N)=0.6

Answer: C
Explanation: P(N→V)=0.70, P(V→N)=0.50 ⇒ 0.6 × 0.70 × 0.50 = 0.21.

Given the transition probability matrix and as per the bigram model, the probability of the POS sequence N → V → N can be calculated by simply multiplying the bigram probabilities P(V | N) and P(N | V). (bigram probability is P(tagj | tagi)). The following are two valid calculations;

  • Without START state: P(N → V → N) = P(V | N) X P(N | V) = 0.70 X 0.50 = 0.35
  • With START state: P(N → V → N) = P(N | START) X P(V | N) X P(N | V) = 0.6 X 0.70 X 0.50 = 0.21

In this question, you are given bigram transition probabilities and initial (START) probability.

13. Using HMM emissions below, what is the most probable tag for "runs"?
WordP(word|Verb)P(word|Noun)
runs0.650.20
dog0.100.75
Prior: P(N)=0.4, P(V)=0.6

Answer: A
Explanation: 0.65×0.6 = 0.39 ≫ 0.20×0.4 = 0.08 ⇒ Verb.

The question asks for the tag (Verb or Noun) that is more likely to have generated the word "runs".

How do we decide the tag?

In Hidden Markov Models (HMMs), for a single word, we compare:

P(tag) × P(word | tag)

Step-by-step calculation

1. Probability that runs is a Verb

P(V) × P(runs | V) = 0.6 × 0.65 = 0.39

2. Probability that runs is a Noun

P(N) × P(runs | N) = 0.4 × 0.20 = 0.08

Compare the values

Since 0.39 > 0.08, Verb is the more probable tag.

14. You observe the sequence "cats chase mice". Which tag is most probable for chase?
WordNounVerb
cats0.650.05
chase0.100.80
mice0.700.05

The table can be read as P(cats | Noun) = 0.65 and P(cats | Verb) = 0.05 and so on


Answer: B
Explanation: P(chase|Verb)=0.80 ≫ P(chase|Noun)=0.10.

You have an observation sequence "cats chase mice" and need to determine which part-of-speech tag (Noun or Verb) is most probable for the word "chase".

The table provides emission probabilities — the probability of observing a particular word given a specific POS tag

For the word "chase", we look at its emission probabilities:

  • P(chase | Noun) = 0.10 (probability of observing "chase" if it's a Noun)
  • P(chase | Verb) = 0.80 (probability of observing "chase" if it's a Verb)

Since 0.80 > 0.10, the word "chase" is much more likely to be a Verb than a Noun.

15. Given transitions and equal priors, what is P(N→N→V)?
From \ ToNV
N0.550.45
V0.300.70
P(N)=0.5

Answer: A
Explanation: 0.5 × 0.55 × 0.45 = 0.12375.
16. Which tag is more probable for "light" given context? Previous tag = ADJ

Answer: C
Explanation: Insufficient data.

What is required to answer this type of question?

To find the most probable tag for a given word, even when the previous tag is known, you need:

P(ti | ti−1) and P(wi | ti)

  • Transition probability → models tag sequence likelihood. (Example: P(NOUN | ADJ), P(ADJ | ADJ))
  • Emission probability → models word–tag compatibility. (Example: P(light | NOUN), P(light | ADJ))

Both are required for a valid probabilistic decision in an HMM. But none of these probabilities are provided in the question.

17. In an HMM-based POS tagger, how is the tag of the first word in a sentence primarily determined?

Answer: C
Explanation: In an HMM, the best tag for a sentence start is mainly determined by the START → tag transition probability, and Determiners most frequently follow the START state in training data.
At the start of a sentence, there is no previous real tag. Instead, HMM uses a special START state. So for the first word, the tag is chosen mainly using: P(tag∣START) This probability is learned from training data by counting which tags most often begin sentences.
18. Compute likelihood of sequence DT → N → V given the initial P(DT) = 0.55.
From \ ToDTNV
DT0.100.750.15
N0.050.600.35
V0.100.200.70



Answer: A
Explanation: 0.55×0.75×0.35 = 0.144.

What does “likelihood of the sequence DT → N → V” mean?

It means: What is the probability that the HMM generates this tag sequence?

In an HMM, the likelihood of a tag sequence is computed by multiplying the relevant transition probabilities.

Formula to be used

P(DT → N → V) = P(DT | START) × P(N | DT) × P(V | N)


Substituting values

= 0.55 × 0.75 × 0.35 = = 0.144

19. Which tag is most probable for the word "book" given emissions P(book | Noun) = 0.45 and P(book | Verb) = 0.40? Transition probabilities for fixed previous tag VERB are as follows: P(N | V) = 0.55 and P(V | V) = 0.45.

Answer: A
Explanation: 0.55×0.45 = 0.2475 greater thatn 0.45×0.40 = 0.18 ⇒ Noun.

What does HMM compute?

For each possible tag t, an HMM compares:

P(t | previous tag) × P(word | t)

Step 1: Probability of Noun

P(N) = P(V → N) × P(book | N) = 0.55 × 0.45 = 0.2475

Step 2: Probability of Verb

P(V) = P(V → V) × P(book | V) = 0.45 × 0.40 = 0.18

Since probability of NOUN is greater than that of VERB, NOUN is probable.

20. If emission probability for unknown words is smoothed using Laplace smoothing, what happens?

Answer: B
Explanation: Laplace smoothing prevents zero probabilities.

What is Laplace smoothing?

Laplace smoothing (add-one smoothing) is used to handle unseen events in probabilistic models.

In POS tagging with HMMs, an unknown word is a word that did not appear in the training data.

Without smoothing:

  • P(unknown word | tag) = 0
  • This would make the entire sequence probability zero, even if all other probabilities are high.

What does Laplace smoothing do?

Laplace smoothing adds 1 to all word counts:

As a result:

  • Even unseen words receive a small, non-zero probability
  • No emission probability becomes zero

Therefore, the correct answer is: Option B

Thursday, December 11, 2025

Top 10 Advanced HMM for POS Tagging — Important MCQs (2nd Order, Trigram, Smoothing, Viterbi)

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

Advanced POS Tagging with Hidden Markov Models — MCQ Introduction

Master POS tagging with this focused MCQ set on Hidden Markov Models (HMM): second-order & trigram models, Viterbi decoding, Baum-Welch training, smoothing techniques, and practical tips for supervised & unsupervised learning.

Part-of-speech (POS) tagging is a cornerstone task in Natural Language Processing (NLP) that assigns grammatical categories (noun, verb, adjective, etc.) to each token in a sentence. This question set concentrates on classical statistical taggers built with Hidden Markov Models (HMMs), which model tag sequences as hidden states and observed words as emissions.

HMM-based POS taggers remain valuable for their interpretability and efficiency. They are particularly useful when you need:

  • Lightweight, fast taggers for resource-constrained systems
  • Explainable probabilistic models for linguistic analysis and teaching
  • Strong baselines before moving to neural models like BiLSTM or Transformer taggers

This MCQ collection targets advanced HMM concepts — including second-order (trigram) models, the Viterbi decoding algorithm, Forward–Backward / Baum–Welch for unsupervised learning, and various smoothing strategies to handle rare or unseen words. Each question includes a concise explanation to help you understand not only the correct choice but why it matters for real-world POS tagging.

What you’ll learn

  1. How higher-order HMMs (trigram / second-order) capture broader tag context.
  2. Why supervised training requires labeled word–tag corpora and how unsupervised EM works.
  3. The purpose of smoothing (Laplace, Good-Turing) to avoid zero probabilities.
  4. Trade-offs: model complexity, overfitting, and inference cost when increasing hidden states.

Use these MCQs to prepare for exams, interviews, or to evaluate your grounding before progressing to neural POS taggers. Scroll down to start the questions and test your understanding of HMM-based POS tagging fundamentals and advanced techniques.

11. A second-order POS HMM considers:

Answer: B
Explanation:

Higher-order HMM models allow P(tᵢ | tᵢ₋₁, tᵢ₋₂) improving context.

A second-order POS HMM (Hidden Markov Model) is also called a trigram HMM.

It means: P(tᵢ | tᵢ₋₁, tᵢ₋₂).

This tells us: The probability of the current tag depends on two previous tags. Therefore, the HMM "looks back" two steps in the tag sequence.

Example: Let us take a simple sentence "dog chase cats". For the last word “cats”, whose tag is t₃, and given the previous two tags: t₂ = VERB (tag for “chase”) t₁ = NOUN (tag for “dogs”), you should write:

𝑃(𝑡cats ∣ 𝑡chase = VERB, 𝑡dogs = NOUN)
12. Input for supervised POS HMM training must contain:

Answer: B
Explanation:

Supervised models need labeled corpora to learn emission + transition probabilities.

Supervised POS HMM — Why labeled data is required?

A supervised POS HMM needs already labeled data so it can learn probabilities:

Transition probabilities
P(tk | tk-1) or P(tk | tk-1, tk-2)

These require tag sequences.

Emission probabilities
P(wk | tk)

These require each word paired with its correct tag.

Therefore, supervised training must have sentences where every word already has a POS tag.

Example training line:
Dogs/NOUN chase/VERB cats/NOUN
13. Decoding in POS HMM refers to:

Answer: C
Explanation:

Decoding maps observed words to best hidden tag sequence using Viterbi.

In a POS HMM (Hidden Markov Model), decoding means: Finding the most probable sequence of POS tags for a given sequence of words. This is usually done using the Viterbi algorithm.

So, decoding = tagging = choosing the best tag path.
14. Forward-Backward algorithm is mainly used for:

Answer: B
Explanation:

It computes expected probabilities used in Baum-Welch EM training.

More information:

The Forward–Backward algorithm is the core of the Baum–Welch algorithm, which is used for: Unsupervised training of Hidden Markov Models (HMMs).

In unsupervised learning, the data has no POS tags, so the model must estimate: Transition probabilities, Emission probabilities

The Forward–Backward algorithm computes:

  • Forward probabilities α
  • Backward probabilities β
  • Expected counts for transitions and emissions
These expected counts are then used to re-estimate HMM parameters. This is EM (Expectation–Maximization).
15. Smoothing in HMM prevents:

Answer: D
Explanation:

Unseen tag-word or tag-tag pairs must not be zero → smoothing distributes probability.

In an HMM, probabilities are estimated from counts in the training data. If a transition or a word–tag pair never appears in training data, its probability becomes zero. This is dangerous because:

  • A zero probability wipes out entire Viterbi paths
  • The model cannot handle unseen words or unseen tag transitions

Smoothing (like Laplace, Good–Turing, Witten–Bell) adds a small nonzero probability to unseen events.

So smoothing prevents: Zero-probability transitions and zero-probability emissions.
16. A core limitation of POS HMM is that it:

Answer: B
Explanation:

HMM relies only on previous tag and assumes words depend only on tag, limiting context.

Markov + word independence - A limiation in HMM

A standard POS HMM makes two strong assumptions.

  • Markov Assumption (for tags): The current tag depends only on a small number of previous tags.
    • This ignores long-range syntactic dependencies (e.g., subject–verb agreement across clauses).
  • Output Independence Assumption (for words): Words depend only on their own tag, not surrounding words.
    • This ignores context that modern taggers use (e.g., CRFs, BiLSTMs, Transformers).

These assumptions simplify the model, but they also severely limit accuracy compared to modern NLP models.

17. Increasing hidden states in POS HMM generally:

Answer: A
Explanation:

More states = more parameters → risk of overfitting & slower inference.


Increasing hidden states in POS HMM may cause overfitting

In a Hidden Markov Model (HMM) used for Part-of-Speech (POS) tagging, the "hidden states" correspond to the POS tags (like Noun, Verb, Adjective). Increasing the number of hidden states means using a more granular tagset (e.g., splitting "Noun" into "Singular Noun" and "Plural Noun") or simply increasing the model's capacity in an unsupervised setting.


Effect of increasing hidden states - Discussion

When you increase the number of states N:

  • You must estimate many more parameters.
  • But your dataset size stays the same.

So the model tries to estimate:

  • Many more transition probabilities (N2),
  • Many more emission probabilities (N × V).

With limited data, the HMM begins to:

  • Fit the quirks/noise of the training data,
  • Memorize rare patterns,
  • Over-specialize to word sequences it has seen,
  • Lose its ability to generalize to unseen text.

This phenomenon is overfitting.

18. Rare words are best handled using:

Answer: A
Explanation:

Smoothing reallocates probability mass → better tagging for unseen/low-freq words.

19. A trigram HMM improves tagging by modeling:

Answer: B
Explanation:

Trigram uses P(tᵢ | tᵢ₋₁, tᵢ₋₂) → better captures context patterns.

20. Unsupervised POS HMM accuracy increases with:

Answer: B
Explanation:

Morphology aids tagging without labels → suffix, prefix, capitalization rules.

Tuesday, December 2, 2025

Master HMM with MCQs – Hidden Markov Model Tagging Explained

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

Hidden Markov Model - MCQs - Problem-based Practice Questions

HMM-Based POS Tagging Practice

These questions explore key aspects of Hidden Markov Model (HMM) based Part-of-Speech (POS) tagging. Some questions explicitly provide prior (initial) probabilities, while others focus only on transition and emission probabilities. You will practice:

  • Calculating posterior probabilities for individual words.
  • Evaluating sequence likelihoods using transitions and emissions.
  • Handling unseen words with smoothing techniques.
  • Determining most likely tag sequences based on high-probability transitions.
1. Consider the following HMM for POS tagging:

Emission ProbabilitiesTransition Probabilities
P(dog | Noun) = 0.6P(next = Noun | current = Noun) = 0.4
P(dog | Verb) = 0.1P(next = Verb | current = Noun) = 0.6
P(runs | Noun) = 0.1P(next = Noun | current = Verb) = 0.5
P(runs | Verb) = 0.7P(next = Verb | current = Verb) = 0.5

In the table, 'next' and 'current' in the probability P(next = Noun | current = Noun) refer to 'POS tag of next word' and 'POS tag of current word' respectively.
Which is the most likely tag sequence for the sentence “dog runs” using the HMM?

A. Noun → Noun
B. Noun → Verb
C. Verb → Noun
D. Verb → Verb

Answer: B
Explanation:

P(Noun→Verb) = 0.6 × 0.6 × 0.7 = 0.252. Highest likelihood = Noun→Verb.

Step-by-Step Probability Computation

We compute the probability for each possible tag sequence using:

P(t₂ | t₁) × P(dog | t₁) × P(runs | t₂)


1. Sequence: Noun → Noun
  • P(dog | Noun) = 0.6
  • P(Noun | Noun) = 0.4
  • P(runs | Noun) = 0.1

0.6 × 0.4 × 0.1 = 0.024


2. Sequence: Noun → Verb
  • P(dog | Noun) = 0.6
  • P(Verb | Noun) = 0.6
  • P(runs | Verb) = 0.7

0.6 × 0.6 × 0.7 = 0.252


3. Sequence: Verb → Noun
  • P(dog | Verb) = 0.1
  • P(Noun | Verb) = 0.5
  • P(runs | Noun) = 0.1

0.1 × 0.5 × 0.1 = 0.005


4. Sequence: Verb → Verb
  • P(dog | Verb) = 0.1
  • P(Verb | Verb) = 0.5
  • P(runs | Verb) = 0.7

0.1 × 0.5 × 0.7 = 0.035

Highest Probability = 0.252

Most likely tag sequence:

B. Noun → Verb

2. For the HMM below:

Initial tag probabilities:
• P(Noun) = 0.7
• P(Adj) = 0.3

Emission probabilities for the word "red":
• P("red" | Noun) = 0.2
• P("red" | Adj) = 0.8

Calculate the normalized probability that 'red' is tagged as an Adjective.

A. 0.14
B. 0.56
C. 0.63
D. 0.24

Answer: C
Explanation:

Compute unnormalized scores:
• Adj = 0.3 × 0.8 = 0.24
• Noun = 0.7 × 0.2 = 0.14

Normalize to get posterior:
Adj = 0.24 / (0.24 + 0.14) ≈ 0.63.

Calculate the probability of a specific tag assignment for a single observed word

In a Hidden Markov Model (HMM), the probability of a specific tag assignment for a single observed word is calculated using the Joint Probability of the tag and the word. This is often referred to as the "Viterbi score" or "path probability" for that specific tag.

The formula for the joint probability of a single state (tag) and observation (word) is:

P(Tag,Word) = P(Tag) × P(Word∣Tag)

Where:

  • P(Tag) is the Initial tag probability (Prior).
  • P(Word∣Tag) is the Emission probability (Likelihood).

Step 1: Calculate the Joint Probabilities for All Tags

For each possible tag, compute the product of the initial probability and emission probability:

For tag = Adjective (Adj):


P(Adj,"red") = P(Adj) × P("red"∣Adj) = 0.3 × 0.8 = 0.24

For tag = Noun:


P(Noun,"red") = P(Noun) × P("red"∣Noun) = 0.7 × 0.2 = 0.14


Step 2: Calculate the Normalizing Constant (Total Probability)

The normalizing constant is the sum of all joint probabilities:

P("red") = P(Adj,"red") + P(Noun,"red") = 0.24 + 0.14 = 0.38


Step 3: Apply Bayes' Theorem to Get the Posterior Probability

Using the normalization formula:

P(Adj∣"red") = P(Adj,"red") / P("red") = 0.24 / 0.38

Simplifying:


P(Adj∣"red") = 0.24 / 0.38 ≈ 0.6316 or approximately 63.16%

Final Answer

The normalized probability that 'red' is tagged as an Adjective is:

P(Adj∣"red") = 0.24 / 0.38 ≈ 0.632 or 63.2%

3. Using the HMM:

TransitionDetNoun
Det0.10.9
Noun0.40.6

Emission P(word|tag)DetNoun
"the"0.80.05
"cat"0.010.9

Most likely tagging for "the cat" is:

A. Det → Det
B. Det → Noun
C. Noun → Det
D. Noun → Noun

Answer: B
Explanation:

Solution: Let's solve this step by step using the Hidden Markov Model (HMM)

The goal is to find the most likely sequence of tags for the sentence "the cat" using the Viterbi principle.

Step 1: Understand the tables

Transition probabilities (P(tag₂ | tag₁)):

From\ToDetNoun
Det0.10.9
Noun0.40.6

For example, if the previous tag is Det, the probability that the next tag is Noun is 0.9.

Emission probabilities (P(word | tag)):

WordDetNoun
the0.80.05
cat0.010.9

For example, the probability that the word "cat" is emitted by a Noun is 0.9.

Step 2: Compute joint probabilities for all sequences

We consider all possible tag sequences for "the cat":

Det → Det

P(Det → Det) = transition × emission
Step 1 (first word "the" as Det): P("the"|Det) = 0.8
Step 2 (second word "cat" as Det): transition P(Det|Det) = 0.1, emission P("cat"|Det) = 0.01

Total probability = 0.8 × 0.1 × 0.01 = 0.0008

Det → Noun

Step 1 "the" as Det: P("the"|Det) = 0.8
Step 2 "cat" as Noun: transition P(Noun|Det) = 0.9, emission P("cat"|Noun) = 0.9

Total probability = 0.8 × 0.9 × 0.9 = 0.648

Noun → Det

Step 1 "the" as Noun: P("the"|Noun) = 0.05
Step 2 "cat" as Det: transition P(Det|Noun) = 0.4, emission P("cat"|Det) = 0.01

Total probability = 0.05 × 0.4 × 0.01 = 0.0002

Noun → Noun

Step 1 "the" as Noun: P("the"|Noun) = 0.05
Step 2 "cat" as Noun: transition P(Noun|Noun) = 0.6, emission P("cat"|Noun) = 0.9

Total probability = 0.05 × 0.6 × 0.9 = 0.027

Step 3: Compare probabilities

SequenceProbability
Det → Det0.0008
Det → Noun0.648
Noun → Det0.0002
Noun → Noun0.027

Step 4: Most likely tagging

The most likely sequence is: Det → Noun (Option B)

"the" is a determiner (Det), and "cat" is a noun (Noun)

4. Given the emission matrix:

Word → TagVerbAdv
"quickly"0.20.7

If P(Verb)=0.5 and P(Adv)=0.5 initially, probability the word "quickly" is tagged Adv:

A. 0.41
B. 0.55
C. 0.78
D. 0.64

Answer: C
Explanation:

P(Tag,Word) = P(Tag) × P(Word∣Tag)

If "quickly" is tagged as Verb: 0.5×0.2=0.10;

If "quickly" is tagged as Adv: 0.5×0.7=0.35.

Highest is Adv. Hence, normalized Adv = 0.35/(0.10+0.35) = 0.35/0.45 ≈ 0.78.

5. A word appears 10 times as Noun and 2 times as Verb in training. Without smoothing P(word|Noun)= ?

A. 0.2
B. 0.5
C. 0.83
D. 0.91

Answer: C
Explanation:

P(word|noun) means "Out of all times the word occurs, how many times did it occur with the tag Noun?". We are not doing any smoothing—just using raw counts.

The word appears 10 times as Noun

The same word appears 2 times as Verb

Total appearances of the word = 10 + 2 = 12


Since we want P(word | Noun):

P(word | Noun) = Count(word with Noun) / Total count of the word

Substitute the values

P(word | Noun) = 10 / 12 = 0.8333

Rounded: 0.83

6. In an HMM POS-tagger, we want to estimate the emission probability of an unseen word. Consider the word "glorf", which never occurred in the training data.
For the tag Noun, the training corpus contains:
  • Total noun-tagged word tokens = 50
  • Count of "glorf" = 0
  • Vocabulary size (unique words) = 10
Using Add-1 (Laplace) smoothing, compute 𝑃("glorf" ∣ Noun).

A. 1/60
B. 1/51
C. 1/61
D. 51/61

Answer: A
Explanation:

Laplace smoothing → (0+1)/(50 + 10) = 1/60.

Understanding the question

We have an unseen word: "glorf"
That means in the training data count("glorf" | Noun) = 0.
We want to compute P("glorf" | Noun) using Add-1 (Laplace) smoothing.

✅ Given

Total noun tokens 50
Count of "glorf" under Noun 0
Vocabulary size (V) 10

Add-1 smoothing formula

P(w | tag) = (count(w, tag) + 1) / (total tokens under tag + V)


Step-by-step calculation

P("glorf" | Noun) = (0 + 1) / (50 + 10) = 1 / 60

7. Which sentence has lower HMM likelihood given high Verb→Noun transition?

A. eat food
B. food eat

Answer: B
Explanation:

"food eat" requires Noun→Verb, which may be low and less natural under English HMM statistics. Because its tag sequence (Noun → Verb) does NOT match the high-probability Verb → Noun transition that the HMM expects.

"eat food" (Verb -> Noun) has HIGH HMM likelihood


"food eat" (Noun -> Verb) has LOW HMM likelihood

8. Given partial Viterbi table:

twordbest tagprob
1fishNoun0.52
2swimVerb0.46

Assume the HMM has a strong Verb → Noun transition (i.e., P(Noun|Verb) is high).

Model predicts next tag likely:

A. Noun
B. Verb
C. Both equal
D. Cannot determine

Answer: A
Explanation:

Since the best tag at t=2 is Verb, the predicted next tag depends mainly on the transition probabilities from Verb. The question explicitly states that Verb → Noun transition is strong. Therefore, the HMM expects the next tag to be Noun with highest probability.

Why the Viterbi algorithm predicts Noun as the next tag

The Viterbi algorithm will predict Noun as the most likely next tag because:

  • High transition probability boost: P(Noun|Verb) is high, which significantly increases the probability of the Noun path.
  • Natural language patterns: Verbs commonly take noun objects in English (for example, "swim laps", "fish upstream"), so Verb → Noun sequences are frequent.
  • Viterbi maximization: The algorithm selects the tag sequence that produces the maximum accumulated probability. With a strong Verb→Noun transition, the Noun path will typically have a higher accumulated probability than alternatives.

The strong transition probability from Verb to Noun makes this the most likely prediction for the next tag in the sequence.

9. In an HMM for POS tagging, you are given the following transition probabilities for adjectives:

  • An adjective is followed by a noun with probability 0.75
  • An adjective is followed by another adjective with probability 0.10
These probabilities tell us which tags usually come after an adjective in the training data.

Using only these transition probabilities, which 2-word phrase does the HMM consider more likely?

A. beautiful red
B. beautiful flower

10. In an HMM POS tagger, you observe the single word "cat". The model gives you the following probabilities:

Tag TransitionProbability
DT → NN0.8
DT → VB0.2
Emission"cat"
NN emits "cat"0.7
VB emits "cat"0.1
For this one-word sentence, the tag is chosen mainly based on the emission probability of the word. Based on these values, which tag is the HMM most likely to assign to the word "cat"?

A. DT
B. NN
C. VB
D. Cannot determine

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents