Top 5 Machine Learning Quiz Questions with Answers explanation, Interview 
questions on machine learning, quiz questions for data scientist answers
 explained, machine learning exam questions, question bank in machine learning, cross-validation, conditional probability, credit card fraud detection
  
Machine
learning MCQ - Set 19
1. Which of the
following cross validation versions may not be suitable for very large datasets
with hundreds of thousands of samples?
a) k-fold
cross-validation
b) Leave-one-out
cross-validation
c) Holdout method
d) All of the above
Click here to view answer
Ans : (b)
 
  | 
   Answer: (b) Leave-one-out cross-validation  
  Leave-one-out
  cross-validation (LOO cross-validation) is not suitable for very large
  datasets due to the fact that this validation technique requires one model for
  every sample in the training set to be created and evaluated.
  Cross validation 
  It is a technique
  to evaluate a machine learning model and it is the basis for whole class of
  model evaluation methods. The goal of cross-validation is to test the model's
  ability to predict new data that was not used in estimating it. It works by
  the idea of splitting dataset into number of subsets, keep a subset aside,
  train the model, and test the model on the holdout subset. 
  Leave-one-out
  cross validation 
  Leave-one-out
  cross validation is K-fold cross validation taken to its logical extreme,
  with K equal to N, the number of data points in the set. That means that N
  separate times, the function approximator is trained on all the data except
  for one point and a prediction is made for that point. As before the average
  error is computed and used to evaluate the model. The evaluation given by
  leave-one-out cross validation is very expensive to compute at first pass.
  [For more information on other cross-validation techniques you may refer  here]  
   | 
 
 
   
 
2. Assume that A
and B are two events. If P(A, B) increases while P(A) decreases, then which of
the following must be true?
a) P(A|B)
decreases.
b) P(B|A) increases.
c) P(B) decreases.
d) P(A|B) increases.
Click here to view answer
Ans : (b)
 
  | 
   Answer: (b) P(B|A) increases 
  The traditional
  approach for defining conditional probability is through joint probability.
  This can be expressed as follows;
  P(B | A) = P(A, B)
  / P(A) 
  In this equation,
  if P(A) decreases, then only the increase in P(B|A) will result in increase
  of P(A, B). 
   | 
 
 
   
 
3. Which of the
following cross validation versions is suitable quicker cross-validation for
very large datasets with hundreds of thousands of samples?
a) k-fold
cross-validation
b) Leave-one-out
cross-validation
c) Holdout method
d) All of the above
Click here to view answer
Ans : (c)
 
  | 
   Answer: (c) Holdout method  
  Holdout
  cross-validation method is suitable for very large dataset because it is the
  simplest and quicker to compute version of cross-validation.
  What is cross-validation?
  Refer the answer for question 1 in this page. 
  Holdout method 
  In this method,
  the dataset is divided into two sets namely the training and the test set
  with the basic property that the training set is bigger than the test set.
  Later, the model is trained on the training dataset and evaluated using the
  test dataset. 
   | 
 
 
   
 
4. Which of the
following is a disadvantage of k-fold cross-validation method?
a) The variance of
the resulting estimate is reduced as k is increased.
b) This usually does
not take longer time to compute
c) Reduced bias
d) The training
algorithm has to rerun from scratch k times
Click here to view answer
Ans : (d)
 
  | 
   Answer: (d) The training algorithm has to rerun from scratch k
  times  
  In k-fold
  cross-validation, the dataset is divided into k subsets. Like in holdout
  method, these subsets are divided into training and test sets as follows;  
  a)    
  One of the subsets is chosen as the test set and the
  other subsets put together forms the training set. 
  b)    
  Train a model on training set and test using test set 
  c)     
  Keep the score to calculate the average error. 
  d)    
  Repeat (a) to (c) for all individual subsets as test
  sets 
  Here, as there is
  a change in the training set in every cycle, the training algorithms has to
  rerun from scratch k times. Hence, it takes k times as much computation to make an evaluation.  
   | 
 
 
   
 
5. Consider that
you are analyzing a large collection of fraudulent credit card transactions to
discover if there are sub-types of these transactions. Which of the following
learning methods best describes the given learning problem?
a) Reinforcement
Learning
b) Supervised
Learning
c) Unsupervised
Learning
d) Semi-supervised
learning
Click here to view answer
Ans : (c)
 
  | 
   Answer: (c) Unsupervised learning  
  Unsupervised
  learning is a type of machine learning algorithm used to draw inferences from
  datasets consisting of input data without labeled responses. 
  It can be thought
  of as self-learning process where the algorithm can find previously unknown
  patterns in datasets that do not have any sort of labels. 
  k-means
  clustering and Apriori algorithm are the unsupervised learning techniques.
  Anomaly detection
  and clustering are some of the applications of unsupervised learning.
   | 
 
 
   
 
**********************
Related links:
What are the applications of unsupervised learning
What type of learning is the credit card fraud detection
What are the disadvantages of k-fold cross-validation
Why the leave-one-out cross-validation (loocv) is not best suited for very large databases
Explain cross-validation
List the different cross validation methods
Which cross validation methods does not consume longer times to complete. Fastest cross-validation method.
Discuss the steps of k-fold cross-validation 
Why k-fold cross-validation takes more time than holdout method
 
 
great
ReplyDelete