10 Advanced Decision Tree MCQs: Splitting, Overfitting & Pruning Concepts
1. When calculating information gain, the weight of each child node is proportional to:
A) Its entropy value
B) Number of samples in that node
C) Number of attributes used
D) Number of pure classes
Answer: B
Explanation: Each child’s influence on information gain depends on how many samples it contains — bigger child = bigger weight.
2. If a decision tree is very deep and fits perfectly to training data, the issue is:
A) Underfitting
B) Overfitting
C) Bias
D) Data leakage
Answer: B
Explanation: Overfitting happens when a decision tree learns the training data too perfectly, including all the little quirks and noise, instead of learning the general pattern. Very deep tree with many branches is a sign of overfitting.
3. Post-pruning is applied:
A) Before splitting
B) After the full tree is built
C) During initial data cleaning
D) During feature selection
Answer: B
Explanation: Pruning is the process of removing unnecessary branches or nodes from a decision tree to make it simpler and better at generalizing to new data. Post-pruning is done after the full decision tree has been built. Grow first, prune later for better generalization and less overfitting.
We usually remove branches that do not improve accuracy on validation/test data.
4. Which measure prefers attributes with many distinct values, causing possible overfitting?
A) Information Gain
B) Gain Ratio
C) Gini Index
D) Chi-square
Answer: A
Explanation:
Information Gain (IG): Measures how much an attribute reduces entropy. IG tends to favor attributes with many distinct values (like ID numbers) because they split the data into very small groups, often making each child pure. This can lead to overfitting — the tree memorizes the training data instead of learning patterns.
Gain Ratio: Corrects IG’s bias toward many-value attributes by normalizing it with the intrinsic information of the split.
Gini Index / Chi-square: Do not have the same strong bias as IG toward many distinct values.
5. In decision tree construction, continuous attributes (like “Age”) are handled by:
A) Ignoring them
B) Creating intervals or thresholds
C) Converting to categorical only
D) Rounding off values
Answer: B
Explanation: Continuous attributes are split at optimal cut-off points to convert them into “less-than / greater-than” branches so the tree can handle them effectively.
6. When all attribute values are the same but classes differ, what happens?
A) The tree stops
B) Merge classes
C) Add a new attribute
D) Randomly assign majority class
Answer: D
Explanation: In a decision tree, each split is based on an attribute that can separate the classes. If all attributes have the same value for the remaining samples, no further split is possible. But if the samples have different classes, the node is impure.
What the algorithm does?
Since it cannot split further, it assigns the majority class to the node. This is a practical solution to handle situations where the tree can’t separate the classes anymore.
7. In C4.5, the gain ratio is used to correct the bias of information gain toward attributes that have:
A) Many missing values
B) Continuous distributions
C) Many distinct values
D) Uniform entropy
Answer: C
Explanation: Information gain favors attributes with many distinct values, because splitting on them often creates very pure child nodes, giving high IG, even if the attributes are not meaningful for prediction. This lead to overfitting.
C4.5 uses gain ratio to correct this bias. Gain ratio = Information Gain / Split Information. It penalizes attributes with many distinct values, preventing the tree from choosing them just because they split the data perfectly.
8. The entropy of a node with class probabilities [0.25, 0.75] is approximately:
A) 0.25
B) 0.56
C) 0.81
D) 1.0
Answer: C
Calculation:
9. If a split divides a node into child nodes that have the same entropy as the parent node, what is the resulting information gain?
A) Zero
B) Equal to entropy of parent
C) One
D) Half of parent entropy
Answer: A
Explanation: If splitting a node doesn’t reduce entropy at all (child nodes are just as impure as the parent), the information gain = 0, meaning the split doesn’t improve the tree.
10. Which of these combinations produces maximum information gain?
A) Parent entropy high, child entropies low
B) Parent entropy low, child entropies high
C) Both high
D) Both low
Answer: A
Explanation: The parent entropy represents the initial uncertainty. The weighted sum of child entropies represents the remaining uncertainty after the split. IG is maximized when the parent is very uncertain (high entropy) and the split produces very pure child nodes (low entropy).
Maximum information gain occurs when a very uncertain parent is split into very pure children, because the split reduces the most uncertainty.
No comments:
Post a Comment