✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
RNN vs LSTM: 10 MCQs with Answers & Detailed Explanations
Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) networks are foundational architectures in sequence modeling used for tasks such as language modeling, time-series prediction, speech recognition, and more. While both are designed to handle sequential data, they differ significantly in how they capture long-term dependencies and manage gradient flow during training.
This MCQ set presents ten advanced, higher-order thinking questions on RNN vs LSTM — testing conceptual clarity, mathematical intuition, gradient propagation, and architectural reasoning. These questions are ideal for university exams, competitive tests (like GATE / UGC NET), interviews, and anyone preparing deeply for machine learning and deep learning assessments.
What You Will Learn
- Key architectural differences between vanilla RNN and LSTM networks
- Why RNNs suffer from vanishing gradients and how LSTM addresses this issue
- Role of gates and memory cell in preserving long-term information
- Practical decision-making on when to use RNN vs LSTM
How to Attempt This Quiz
Read each question carefully and try to answer before revealing the solution. Click the “View Answer” button to see the correct choice along with a conceptual explanation designed to strengthen your understanding.
Explanation:
In a vanilla RNN, gradients are propagated backward through multiple time steps during Backpropagation Through Time (BPTT). At each step, the gradient is multiplied by the recurrent weight matrix and the derivative of activation functions like tanh or sigmoid.
Since derivatives of tanh and sigmoid are typically less than 1, repeated multiplication across many time steps causes the gradient to shrink exponentially. This phenomenon is known as the vanishing gradient problem.
Mathematically, gradients contain terms like:
∂L/∂hₜ × W × W × W × ...
If |W| < 1, the gradient decays rapidly, preventing the network from learning long-range dependencies.
Explanation:
The LSTM introduces a separate memory pathway called the cell state (Cₜ). Unlike RNN hidden states that are updated multiplicatively, the cell state is updated additively:
Cₜ = fₜCₜ₋₁ + iₜĈₜ
Because of this additive structure, gradients can flow backward without being repeatedly multiplied by small numbers. This creates what is often called the constant error carousel, which preserves long-term information.
Explanation:
Vanilla RNNs struggle with long-range dependencies due to vanishing gradients. Information from early time steps fades as it propagates through many transformations.
LSTM, however, uses gates (input, forget, output) to regulate information flow. The forget gate decides what to keep, allowing important early information to persist across hundreds of steps.
Thus, for long sequences (like 500 words), LSTM is structurally better suited.
Explanation:
Using the LSTM update equation:
Cₜ = fₜCₜ₋₁ + iₜĈₜ
If fₜ = 1 and iₜ = 0:
Cₜ = 1·Cₜ₋₁ + 0 = Cₜ₋₁
Thus, the previous memory is preserved exactly. No new information is added and none is forgotten.
Explanation:
LSTM reduces vanishing gradients by creating an additive memory path. Instead of repeatedly multiplying hidden states (as in RNN), it updates the cell state using controlled addition.
Additive updates prevent exponential shrinkage of gradients, enabling better long-term learning.
Explanation:
LSTM contains multiple gates (input, forget, output) and separate weight matrices, increasing computational cost.
For short sequences where long-term dependency is not required, a simple RNN is computationally cheaper and sufficient.
Explanation:
The vanilla RNN updates hidden state multiplicatively:
hₜ = tanh(Wxₜ + Uhₜ₋₁)
This repeated multiplication causes gradients to either vanish or explode over time.
LSTM introduces a fundamentally different update rule:
Cₜ = fₜCₜ₋₁ + iₜ C̃ₜ
This additive memory update allows information to flow across time steps without being repeatedly multiplied. The forget gate (fₜ) controls retention, while the input gate (iₜ) regulates new information. This structural change is the key reason LSTM maintains long-term memory.
Explanation:
The inability to learn long dependencies is usually due to the vanishing gradient problem — a structural limitation of vanilla RNNs.
Increasing hidden size may increase capacity but does not solve gradient decay. Increasing epochs only trains longer but does not fix the underlying gradient instability.
Replacing the RNN with LSTM introduces gated memory mechanisms that explicitly preserve information over long sequences. Therefore, switching architectures is the most principled solution.
Explanation:
In vanilla RNNs, gradients propagate through repeated matrix multiplications:
∂L/∂hₜ × U × U × U × ...
This purely multiplicative pathway causes exponential decay or explosion.
In LSTM, the cell state provides an additive path:
Cₜ = fₜCₜ₋₁ + iₜC̃ₜ
Because addition preserves magnitude better than repeated multiplication, gradients can flow more stably across long sequences.
Explanation:
LSTM extends the vanilla RNN by adding three gates: input, forget, and output gates. These gates control memory flow and prevent vanishing gradients.
If all gating mechanisms are removed, the architecture loses its controlled memory updates and effectively behaves like a standard recurrent neural network with simple hidden state recurrence.
Thus, LSTM without gates collapses into a vanilla RNN.
No comments:
Post a Comment