Machine Learning Exam Interview Questions TRUE or FALSE 02

1. In general when are trying to learn an HMM with a small number of states from a large number of observations, we can almost always increase the training data likelihood by permitting more hidden states.

(a) TRUE                                                   (b) FALSE

Answer: TRUE
To model any finite length sequence, we can increase the number of hidden states in an HMM to be the number of observations in the sequence and therefore (with appropriate parameter choices) generate the observed sequence with probability 1. Given a fixed number of finite sequences (say n), we would still be able to assign probability 1/n for generating each sequence. This is not useful, of course, but highlights the fact that the complexity of HMMs is not limited.

2. Assuming a fixed number of attributes, a Gaussian-based Bayes optimal classifier can be learned in time linear in the number of records in the dataset.

(a) TRUE                                                   (b) FALSE

Answer: TRUE

3. Random forests usually perform better than AdaBoost when your dataset has mislabeled data points.

(a) TRUE                                                   (b) FALSE

Answer: TRUE
Random forest is highly accurate and robust against noise and outliers. The main advantage of random forest is that it is less affected by noise. It tries to reduce variance.
AdaBoost shows poor performance if the data are noisy.
Compared to random forests, AdaBoost performs worse when irrelevant features are included in the model.


