ExploreDatabase – Your one-stop study guide for interview and semester exam preparations with solved questions, tutorials, GATE MCQs, online quizzes and notes on DBMS, Data Structures, Operating Systems, AI, Machine Learning and Natural Language Processing.
Morphological Analysis and Generation using Finite State Transducers - NLP Interview Questions
Top 10 HOT MCQs — Morphological Analysis & Generation using FST
Each question has a View answer button to reveal the correct choice and a short explanation.
Morphological analysis and generation using Finite-State Transducers (FST) form one of the most powerful and efficient rule-based methods in Natural Language Processing (NLP). FSTs help machines break words into their meaningful components, generate correct surface forms, and handle complex morphotactics and phonological rules with high accuracy. This post provides a clear, exam-focused explanation of how FSTs work in morphological processing, along with HOT MCQs, examples, and a detailed FAQ section. Whether you're preparing for NLP, AI, ML, UGC NET, GATE, or university-level linguistic exams, this guide will strengthen your understanding of FST-based morphology and help you master one of the foundational concepts in computational linguistics.
1. In an FST-based morphological analyzer, the input tape typically contains:
A. The lemma and POS B. The surface word form C. The root plus affixes separated by “+” D. The semantic features
Answer: B
Explanation: An analyzer reads the surface string (e.g., "running") and maps it to a lexical representation like run + V + PROG.
What is Finite-State Transducer (FST)?
A Finite-State Transducer (FST) is a finite-state machine that maps input strings to output strings. It is like a Finite-State Automaton (FSA), but instead of only accepting or rejecting, it also produces output.
It has two tapes; input tape (reads input symbols), output tape (writes output symbols). Both of these are processed simultaneously.
Example:
Example: FST for plural formation
Input:cat Output:cats
Transitions might look like:
c : c
a : a
t : t
ε : s (epsilon means no input symbol but output "s")
So the FST reads “cat”, and outputs “cats”.
2. What is the main reason FSTs are efficient for morphological generation?
A. They operate with recursive rules B. They encode allomorphy as probabilistic transitions C. They perform lookup and rule-application in a single pass D. They use neural embeddings for transitions
How FST Performs Lookup and Rule Application in a Single Pass
A finite-state transducer (FST) performs both lexical lookup and
morphological rule application in a single pass because both the
stem paths and rules are pre-compiled into one unified network.
Lexicon + rules are merged:
The stem dictionary, affix rules, and orthographic rules are composed into
one finite-state graph before runtime.
Each transition already contains the rule:
A transition like y : ies or e : ε applies the spelling rule
while reading the input.
Processing is linear:
The machine reads one symbol and follows one valid transition, so lookup and
rule-application happen together.
No separate rule engine:
Rules are “baked into” transitions, so the FST only needs to traverse the graph.
Deterministic traversal:
Most FSTs use unique next states, so no backtracking → only one pass.
Short summary:
FSTs do lookup and rule-application together because every transition already
encodes both the lexical symbol and the rule outcome.
3. The two-level morphology approach (Koskenniemi) represents:
A. A mapping between lemma and POS tags B. A mapping between surface and lexical levels C. A mapping between syntax and semantics D. A mapping between morphemes and orthographic rules
Answer: B
Explanation: Two-level morphology relates lexical symbols and surface symbols with parallel constraints (lexical ↔ surface).
The two-level morphology represents a parallel mapping between the underlying lexical representation of a word and its surface form, allowing morphological and phonological rules to operate simultaneously.
What is "Two-Level Morphology"?
Two-level morphology (proposed by Kimmo Koskenniemi, 1983) is a model in computational
linguistics in which each word is described simultaneously on two levels:
Lexical (underlying) level — the morphemes as stored in the dictionary.
Surface level — the actual written (surface) form after morphological and
orthographic changes have been applied.
These two levels are linked by parallel constraints (called two-level rules),
which specify allowed correspondences between lexical symbols and surface symbols. Unlike
sequential rewrite systems, two-level morphology enforces these constraints simultaneously,
so rules do not apply in separate ordered phases.
Serial No.
Lexical
Surface
1
cat + PL
cats
2
try + PL
tries
3
run + PAST
ran
4
make + PAST
made
4. Which component handles orthographic rules like “drop-e + ing → dropping” in an FST model?
A. Lexicon transducer B. Rule transducer C. Path minimizer D. Feature unifier
Answer: B
Explanation: Orthographic/phonological alternations are encoded as rule transducers and composed with the lexicon FST.
5. In morphological generation using FSTs, the direction of the transducer operation is:
A. From surface form → lexical form B. From lexical form → surface form C. Always bidirectional by default D. Determined by the POS tagger
Answer: B
Explanation: Generation outputs the surface string from lexical input (e.g., walk + PAST → walked).
6. What ensures that an FST for morphology remains finite even with productive affixation?
A. Looping transitions for affixes B. Morphotactic constraints C. Recursive rule composition D. Neural pruning
Answer: B
Explanation: Morphotactics (legal order & combinations of morphemes) prevent infinite generation by constraining sequences.
7. A key benefit of minimizing an FST for morphological analysis is:
A. It improves semantic accuracy B. It reduces the number of paths and states C. It allows handling non-concatenative morphology D. It produces continuous embeddings
Answer: B
Explanation: Minimization merges equivalent states, producing a smaller, faster transducer without changing accepted mappings.
8. Which of the following is MOST suitable for representing both inflectional and derivational morphology in one model?
A. Deterministic automata B. Weighted FSTs C. Large neural language model D. Rule-based taggers
Answer: B
Explanation: Weighted FSTs can encode alternatives and rank analyses (helpful for ambiguity between derivation/inflection).
Weighted Finite-State Transducers (Weighted FSTs) are a crucial concept in computational linguistics and natural language processing. They extend the idea of finite-state transducers (FSTs) by associating numerical weights with transitions, which allows them to model costs, probabilities, or other quantitative measures alongside the basic input-output mapping.
Core Concepts of Weighted FSTs
A Finite-State Transducer (FST) is a state machine that maps input strings to output strings by traversing a set of states and transitions. A weighted FST augments this model by assigning a weight (often a non-negative real number or a value from a semiring) to each transition.
Structure
A weighted FST typically consists of:
States: The nodes representing different points of the process.
Transitions (Arcs): Edges between states, each labeled with:
Input symbol
Output symbol
Weight
Initial State: Where processing starts.
Final States: Where processing ends, often associated with final weights.
Example
The initial state is label 0. There can only be one initial state. The final state is 2 with final weight of 3.5. Any state with non-infinite final weight is a final state. There is an arc (or transition) from state 0 to 1 with input label a, output label x, and weight 0.5. This FST transduces, for instance, the string ac to xz with weight 6.5 (the sum of the arc and final weights).
9. In FST-based analysis, failing to match an input string to any lexical path indicates:
A. The word is ungrammatical B. The FST is non-deterministic C. The word is out-of-vocabulary (OOV) D. The FST requires stemming
Answer: C
Explanation: No accepted path means the surface form isn’t represented in the lexicon (OOV) unless rules cover it.
What is out-of-vocabulary case? More info:
In a Finite State Transducer (FST)-based morphological analyzer, the system tries to map an input surface form (the word as seen in text) to a lexical entry (lemma + morphological features) stored in its lexicon and rules.
If the FST cannot find any valid path that matches the input word, it means:
The word does not exist in the lexicon, and
None of the morphological rules can generate a matching form.
This situation is known as an Out-of-Vocabulary (OOV) case.
For example, let us assume that the input word (surface form) is 'caught'. If the word 'catch' is not in the lexicon (dictionary), the analyzer cannot map 'caught' to 'catch + PAST', so it returns no successful path.
10. Which statement is TRUE about FST-based morphological analyzers in modern NLP pipelines?
A. They have been fully replaced by neural seq2seq models B. They are still widely used in low-resource languages due to rule efficiency C. They cannot handle phonological alternations D. They cannot be used with POS taggers
Answer: B
Explanation: For many low-resource languages, rule-based FSTs are preferred because they require linguistic expertise rather than large corpora.
No comments:
Post a Comment