#
*Top
5 Machine Learning Quiz Questions with Answers explanation, Interview
questions on machine learning, quiz questions for data scientist answers
explained, machine learning exam questions*

##
__Machine
learning MCQ - Set 05__

__Machine learning MCQ - Set 05__

###
*1. Which of the following
is a clustering algorithm in machine learning?*

a) Expectation Maximization

b) CART

c) Gaussian Naïve Bayes

d) Apriori

**View Answer**Answer: (a) expectation maximzationExpectation
Maximization (EM) is a clustering algorithm that relies on maximizing the likelihood
to find the statistical parameters of the underlying sub-populations in the
dataset. Expectation maximization provides an iterative solution to maximum
likelihood estimation with latent variables.##
CART is a
decision tree algorithm##
Gaussian Naïve
Bayes is Bayesian algorithm##
Apriori is a
association rule learning algorithm |

**2. The model obtained by applying linear regression on the identified subset of features may differ from the model obtained at the end of the process of identifying the subset during**
a) Best-subset
selection

b) Forward stepwise
selection

c) Forward stage wise selection

d) All of the above

**View Answer**Answer: (c) Forward stage wise selectionLet us assume
that the data set has p features among which each method is used to select k;
0 < k < p, features. If we use the selected k features identified by
forward stage wise selection, and apply linear regression, the model we
obtain may differ from the model obtained at the end of the process of
applying forward stage wise selection to identify the k features. This is due
to the manner in which the coefficients are built in this method where at
each step the algorithm computes the simple linear regression coefficient of
the residual on the variable identified as having the largest correlation
with the residual, and adds it to the current coefficient for that variable.
Note that there will be no difference in the other two methods, because in
both forward and backward stepwise selection, at each step of removing/adding
a feature, linear regression is performed on the retained subset of features
to learn the coefficients.[source:
Introduction to machine learning, IITM] |

###
*3. You trained a
binary classifier model which gives very high accuracy on the training data,
but much lower accuracy on validation data. Which of the following may be true?*

a) This is an instance of overfitting.

b) This is an
instance of underfitting.

c) The training was not well regularized.

d) The training and testing examples are sampled from different distributions.

**View Answer**Answer: (a), (c) and (d)Any of these
three options are valid reasons for lower accuracy on test data. |

###
*4. What are
support vectors?*

a) The examples
farthest from the decision boundary.

b) The only examples necessary to compute f(x) in an SVM.

c) The class
centroids.

d) All the examples that have a non-zero weight α

_{k}in a SVM.

**View Answer**Answer: (b) and (d)Only the support
vectors (on the gutters or margin) will have nonzero weights or a’s – this
reduces the dimensionality of the solution.A support vector
machine attempts to find the line that "best" separates two classes
of points. By "best", we mean the line that result in the largest margin between the two
classes. The points that lie on this margin are the support vectors. ##
A Support Vector
Machine (SVM) performs classification by finding the hyperplane that
maximizes the margin between the two classes. The vectors that define the
hyperplane are the support vectors. |

**5. Which of the following is the joint probability of H, U, P, and W described by the given Bayesian Network? [note: as the product of the conditional probabilities]**
a) P(H, U, P, W) =
P(H) * P(W) * P(P) * P(U)

b) P(H, U, P, W) =
P(H) * P(W) * P(P | W) * P(W | H, P)

c) P(H, U, P, W) = P(H) * P(W) * P(P | W) * P(U | H, P)

d) None of the
above

**View Answer**Answer: (c) P(H, U, P, W) = P(H) * P(W) * P(P | W) * P(U | H,
P)In the given
Bayesian network, H and W do not depend on any other nodes. Hence, we consider
the start probabilities P(H) and P(W).Node P has a
transition from W, hence conditional probability P(P|W).Node U has
transitions from H and P, hence the conditional probability is P(U|H, P).For Bayesian
network, the join probability is the product of above said probabilities. |

**************************

###
**Related links:**

**Related links:**