Top 5 Machine Learning Quiz Questions with Answers explanation, Interview questions on machine learning, quiz questions for data scientist answers explained, machine learning exam questions
Machine learning MCQ - Set 05
1. Which of the following is a clustering algorithm in machine learning?
a) Expectation Maximization
b) CART
c) Gaussian Naïve Bayes
d) Apriori
View Answer
| 
Answer: (a) expectation maximzation 
Expectation
  Maximization (EM) is a clustering algorithm that relies on maximizing the likelihood
  to find the statistical parameters of the underlying sub-populations in the
  dataset. Expectation maximization provides an iterative solution to maximum
  likelihood estimation with latent variables. CART is a decision tree algorithmGaussian Naïve Bayes is Bayesian algorithmApriori is a association rule learning algorithm | 
2. The model
obtained by applying linear regression on the identified subset of features may
differ from the model obtained at the end of the process of identifying the
subset during
a) Best-subset
selection
b) Forward stepwise
selection
c) Forward stage wise selection
d) All of the above
View Answer
| 
Answer: (c) Forward stage wise selection 
Let us assume
  that the data set has p features among which each method is used to select k;
  0 < k < p, features. If we use the selected k features identified by
  forward stage wise selection, and apply linear regression, the model we
  obtain may differ from the model obtained at the end of the process of
  applying forward stage wise selection to identify the k features. This is due
  to the manner in which the coefficients are built in this method where at
  each step the algorithm computes the simple linear regression coefficient of
  the residual on the variable identified as having the largest correlation
  with the residual, and adds it to the current coefficient for that variable.
  Note that there will be no difference in the other two methods, because in
  both forward and backward stepwise selection, at each step of removing/adding
  a feature, linear regression is performed on the retained subset of features
  to learn the coefficients. 
[source:
  Introduction to machine learning, IITM] | 
3. You trained a binary classifier model which gives very high accuracy on the training data, but much lower accuracy on validation data. Which of the following may be true?
a) This is an instance of overfitting.
b) This is an
instance of underfitting.
c) The training was not well regularized.
d) The training and testing examples are sampled from different distributions.
View Answer
| 
Answer: (a), (c) and (d) 
Any of these
  three options are valid reasons for lower accuracy on test data. | 
4. What are support vectors?
a) The examples
farthest from the decision boundary.
b) The only examples necessary to compute f(x) in an SVM.
c) The class
centroids.
d) All the examples that have a non-zero weight αk in a SVM.
View Answer
| 
Answer: (b) and (d) 
Only the support
  vectors (on the gutters or margin) will have nonzero weights or a’s – this
  reduces the dimensionality of the solution. 
A support vector
  machine attempts to find the line that "best" separates two classes
  of points.  By "best", we mean the line that result in the largest margin between the two
  classes.  The points that lie on this margin are the support vectors.  A Support Vector Machine (SVM) performs classification by finding the hyperplane that maximizes the margin between the two classes. The vectors that define the hyperplane are the support vectors. | 
5. Which of the
following is the joint probability of H, U, P, and W described by the given Bayesian
Network? [note: as the
product of the conditional probabilities]
a) P(H, U, P, W) =
P(H) * P(W) * P(P) * P(U)
b) P(H, U, P, W) =
P(H) * P(W) * P(P | W) * P(W | H, P)
c) P(H, U, P, W) = P(H) * P(W) * P(P | W) * P(U | H, P)
d) None of the
above
View Answer
| 
Answer: (c) P(H, U, P, W) = P(H) * P(W) * P(P | W) * P(U | H,
  P) 
In the given
  Bayesian network, H and W do not depend on any other nodes. Hence, we consider
  the start probabilities P(H) and P(W). 
Node P has a
  transition from W, hence conditional probability P(P|W). 
Node U has
  transitions from H and P, hence the conditional probability is P(U|H, P). 
For Bayesian
  network, the join probability is the product of above said probabilities. | 
**********************
 

No comments:
Post a Comment