Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq
Data Warehousing and Data Mining - MCQ Questions and Answers SET 02
1. In non-parametric models
a) There are no parameters
b) The parameters are fixed in advance
c) A type of probability distribution is assumed, then its parameters are inferred
d) The parameters are flexible
Answer: (d) The parameters are flexible Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance. In non-parametric models, no fixed set of parameters and no probability distribution is assumed. They have parameters that are flexible. |
2. The goal of clustering analysis is to:
a) Maximize the inter-cluster similarity
b) Maximize the intra-cluster similarity
c) Maximize the number of clusters
d) Minimize the intra-cluster similarity
Answer: (b) Maximize the intra-cluster similarity One of the goals of a clustering algorithm is to maximize the intra-cluster similarity.A clustering algorithm with small intra-cluster distance (high intra-cluster similarity) and high inter-cluster distance (low inter-cluster similarity) is said to be a good clustering algorithm. Clustering analysis is a technique for grouping similar observations into a number of clusters based on multiple variables for each individual observed value. It is an unsupervised classification. Inter-cluster distance – the distance between two objects from two different clusters.Intra-cluster distance – the distance between two objects from the same cluster. |
3. In decision tree algorithms, attribute selection measures are used to
a) Reduce the dimensionality
b) Select the splitting criteria which best separate the data
c) Reduce the error rate
d) Rank attributes
Answer: (b) Select the splitting criteria which best separate the data Attribute selection measures in decision tree algorithms are mainly used to select the splitting criterion that best separates the given data partition.During the induction phase of the decision tree, the attribute selection measure is determined by choosing the attribute that will best separate the remaining samples of the nodes partition into individual classes. The data set is partitioned according to a splitting criterion into subsets. This procedure is repeated recursively for each subset until each subset contains only members belonging to the same class or is sufficiently small. Information gain, Gain ratio and Gini index are the popular attribute selection measures. |
4. Pruning a decision tree always
a) Increases the error rate
b) Reduces the size of the tree
c) Provides the partitions with lower entropy
d) Reduces classification accuracy
Answer: (b) Reduces the size of the tree Pruning means simplifying/compressing and optimizing a decision tree by removing sections of the tree that are uncritical and redundant to classify instances. It helps in significantly reducing the size of the decision tree. Decision trees are the most susceptible machine learning algorithm to overfitting (the undesired induction of noise in the tree). Pruning can reduce the likelihood of overfitting problem. |
5. Which of the following classifiers fall in the category of lazy learners:
a) Decision trees
b) Bayesian classifies
c) k-NN classifiers
d) Rule-based classifiers
Answer: (c) k-NN classifier k-nearest neighbor (k-NN) classifier is a lazy learner because it doesn’t learn a discriminative function from the training data but “memorizes” the training dataset instead. Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple. When it does, classification is conducted based on the most related data in the stored training data. Lazy learning is also referred as “just-in-time learning”.The other categories of classifiers is “Eager learners”. |
************************
Related links:
Machine learning MCQ questions and answers home
Machine learning TRUE/FALSE questions and answers home
What is lazy learning in data mining?
Which of the data noise problem is reduced through pruning in decision trees?
What is the role of attribute selection measure in data mining.
What are the popular attribute selection measure
Why non-parametric models are said to be flexible
Which machine learning algorithm is most susceptible to overfitting
Define inter-cluster and intra-cluster distance
Machine learning algorithms MCQ with answers
Machine learning question banks and answers