In a data mining task, when it is not clear what type of patterns could be interesting, what should the data mining system do?

When it is not clear what patterns may be interesting, the data mining system should allow interaction with the user to guide the mining process. This helps users apply their intuition and constraints to focus the search on meaningful patterns, a strategy known as constraint-based mining.

To detect fraudulent usage of credit cards, which data mining task should be used?

Outlier analysis should be used to detect fraudulent credit card usage because fraudulent transactions often appear as anomalies that deviate significantly from normal spending behavior.

Why does distance between data points become meaningless in high dimensional spaces?

In high dimensional spaces, it becomes difficult to distinguish between the nearest and farthest neighbors because distances tend to converge to similar values, a phenomenon known as the curse of dimensionality.

Which data mining technique is used to find inherent regularities in data?

Frequent pattern analysis is used to find inherent regularities in data by identifying patterns, itemsets, or sequences that occur frequently within a dataset.

Which dataset is used to provide an unbiased evaluation of a model while tuning hyper-parameters?

The validation dataset is used to evaluate a model while tuning hyper-parameters. It helps in selecting the best model and avoiding overfitting.

In which system are data stored, retrieved, and updated?

OLTP (Online Transaction Processing) systems are used to store, retrieve, and update data. Examples include banking systems and ticket booking systems.

Which type of data is handled by a data warehouse and not found in operational systems?

A data warehouse handles summarized or aggregated data, which is mainly used for analysis and reporting.

Classification is a data mining task that maps data into what?

Classification maps data into predefined groups or classes. Each data item is assigned to a known category.

Which clustering technique starts with one cluster for each data record?

Agglomerative clustering starts with one cluster for each data record and then gradually merges similar clusters.

Which distance measure is similar to the Simple Matching Coefficient (SMC)?

Hamming distance is similar to the Simple Matching Coefficient because both compare corresponding positions of binary vectors to identify similarities and differences.

Prediction differs from classification in which sense?

Prediction differs from classification in the type of outcome value. Classification predicts class labels, while prediction estimates numerical values.

Computer Science and Engineering - Tutorials, Notes, MCQs, Questions and Answers

ExploreDatabase – Your one-stop study guide for interview and semester exam preparations with solved questions, tutorials, GATE MCQs, online quizzes and notes on DBMS, Data Structures, Operating Systems, AI, Machine Learning and Natural Language Processing.

Major links

📚 Click here to explore other CSE subjects

Advanced Database Concepts Data Structures & Operating Systems Natural Language Processing – Notes & Tutorials Quiz Questions and Answers DBMS & ADBMS Question Bank SQL RDBMS Exam & Interview Questions Parallel Databases ADBMS Quizzes Advanced DBMS Concepts Distributed Databases Modern Databases – Special Purpose Object-Based Database Systems Machine Learning MCQ TOP 10 MCQs - Quiz Questions and Answers in CSE – Subject-wise Index

Quicklinks

📌 Quick Links
[ DBMS ] [ DDB ] [ ML ] [ DL ] [ NLP ] [ DSA ] [ PDB ] [ DWDM ] [ Quizzes ]

Showing posts with label Data mining quiz questions. Show all posts

Monday, October 12, 2020

Data warehousing and mining quiz questions and answers set 01

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs

🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.

Quiz Mode:

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 01

1. In a data mining task when it is not clear about what type of patterns could be interesting, the data mining system should:

A. Perform all possible data mining tasks
B. Handle different granularities of data and patterns
C. Perform both descriptive and predictive tasks
D. Allow interaction with the user to guide the mining process

Answer: (d) Allow interaction with the user to guide the mining process

Users have a good sense of which “direction” of mining may lead to interesting patterns and the “form” of the patterns or rules they want to find. They may also have a sense of “conditions” for the rules, which would eliminate the discovery of certain rules that they know would not be of interest. Thus, a good heuristic is to have the users specify such intuition or expectations as constraints to confine the search space. This strategy is known as constraint-based mining.

2. To detect fraudulent usage of credit cards, the following data mining task should be used:

A. Feature selection
B. Prediction
C. Outlier analysis
D. All of the above

Answer: (c) Outlier analysis

Fraudulent usage of credit cards can be detected using outlier analysis or outlier detection.

Outlier

A data element that stands out from the rest of the data. The values that deviate from other observations on data are called outliers. In data distribution, they are not part of the pattern. Sometimes referred to as abnormalities, anomalies, or deviants, outliers can occur by chance in any given distribution.

Outlier analysis

The analysis used to find unusual patterns in a dataset. There are many outlier detection algorithms proposed under these broad categories; statistical based approaches, distance-based approaches, fuzzy approaches and kernel functions.

3. In high dimensional spaces, the distance between data points becomes meaningless because:

A. It becomes difficult to distinguish between the nearest and farthest neighbors
B. The nearest neighbor becomes unreachable
C. The data becomes sparse
D. There are many uncorrelated features

Answer: (a) It becomes difficult to distinguish between the nearest and farthest neighbors

Curse of dimensionality

The dimensionality curse phenomenon states that in high dimensional spaces distances between nearest and farthest points from query points become almost equal. Therefore, nearest neighbor calculations cannot discriminate candidate points.

By high dimensional spaces, we are talking about hundreds to thousands of dimensions for a dense vector (sparse vectors are a completely different topic). Basically once you get up to high-dimensionality, pairwise distance between all of your points approaches a constant.

4. The difference between supervised learning and unsupervised learning is given by:

A. Unlike unsupervised learning, supervised learning needs labeled data
B. Unlike unsupervised leaning, supervised learning can form new classes
C. Unlike unsupervised learning, supervised learning can be used to detect outliers
D. Unlike supervised learning, unsupervised learning can predict the output class from among the known classes

Answer: (a) Unlike unsupervised learning, supervised learning needs labeled data

Supervised learning: Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It is basically a synonym for classification. The supervision in the learning comes from the labeled examples in the training data set.

Unsupervised learning: Unsupervised learning is essentially a synonym for clustering. The learning process is unsupervised since the input examples are not class labeled. Typically, we may use clustering to discover classes within the data. The goal of unsupervised learning is to model the hidden patterns in the given input data in order to learn about the data.

5. Which of the following is used to find inherent regularities in data?

A. Clustering
B. Frequent pattern analysis
C. Regression analysis
D. Outlier analysis

Answer: (b) Frequent pattern analysis

Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It is an intrinsic and important property of datasets.

Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis are some of the applications of frequent pattern analysis.

**********************

What are the applications of frequent pattern analysis

Difference between supervised and unsupervised learning

What is curse of dimensionality

Why the distance between data points are meaningless in high dimensional spaces?

Application of outlier analysis is to detect fraudulent credit card usage

By K Saravanakumar Vellore Institute of Technology - October 12, 2020 No comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Data mining quiz questions, Data warehousing quiz questions

Data warehousing and mining quiz questions and answers set 05

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs

🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.

Quiz Mode:

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 05

1. Which of the following best describes the sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyper-parameters?

A. Training dataset
B. Test dataset
C. Validation dataset
D. Holdout dataset

Answer: (c) Validation dataset

Validation dataset is the sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyper-parameters.

It is usually used for parameter selection and to avoid overfitting. It helps in tuning the parameters of the model. For example, in neural network, it is used to choose the number of hidden units.

Validation dataset is different from test dataset.

The validation set is also known as the Development set.

2. In which of the following, data are stored, retrieved and updated?

A. OLAP
B. MOLAP
C. HTTP
D. OLTP

Answer: (d) OLTP

Online Transaction Processing (OLTP) is a type of data processing in information systems that typically facilitate transaction oriented applications. A system to handle inventory of a super market, ticket booking system, and financial transaction systems are some examples of OLTP.

OLAP is Online Analytical Processing system used primarily for data warehouse environments.

3. Data warehouse deals with which type of data that is never found in the operational environment?

A. Normalized
B. Informal
C. Summarized (aggregated)
D. Denormalized

Answer: (c) Summarized

Data warehouse handles summarized (aggregated) data that are aggregated from OLTP systems.

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data.

Data warehouses are large databases that are specifically designed for OLAP and business analytics workloads.

As per definition of Ralph Kimball, a data warehouse is “a copy of transaction data specifically structured for query and analysis.”

4. Classification is a data mining task that maps the data into _________ .

A. predefined group
B. real valued prediction variable
C. time series
D. clusters

Answer: (a) predefined group

Classification is a data mining function that assigns items in a collection to target categories or classes that are predefined. The goal of classification is to accurately predict the target class for each case in the data. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks. [for more on sample classification problems]

k-nearest neighbor (knn), naïve bayes and support vector machine (svm) are few of the classification algorithms.

5. Which of the following clustering techniques start with as many clusters as there are records or observations with each cluster having only one observation at the starting?

A. Agglomerative clustering
B. Fuzzy clustering
C. Divisive clustering
D. Model-based clustering

Answer: (a) Agglomerative clustering

This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

Agglomerative clustering starts with single object clusters (singletons) and proceeds by progressively merging the most similar clusters, until a stopping criterion (which could be a predefined number of groups k) is reached. In some cases, the procedure ends only when all the clusters are merged into a single one, which is when one aims at investigating the overall granularity of the data structure.

You may refer here for applications of hierarchical clustering

**********************

Data warehousing and mining quiz questions and answers set 04

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.

☰ Quick Links - Browse Related MCQs

🚨 Quiz Instructions:
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.

Quiz Mode:

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 04

1. Minkowski distance is a function used to find the distance between two

A. Binary vectors
B. Boolean-valued vectors
C. Real-valued vectors
D. Categorical vectors

Answer: (c) Real-valued vectors

Minkowski distance finds the distance between two real-valued vectors. It is a generalization of the Euclidean and Manhattan distance measures and adds a parameter, called the “order” or “p“, that allows different distance measures to be calculated.

Minkowski distance,

If p=1 then L1 which is Manhattan distance (change p with 1 in above equation)

If p=2 then L2 which is Euclidean distance (change p with 2 in above equation)

[For more, please refer here]

2. Which of the following distance measure is similar to Simple Matching Coefficient (SMC)?

A. Euclidean distance
B. Hamming distance
C. Jaccard distance
D. Manhattan distance

Answer: (b) Hamming distance

Hamming distance is the number of bits that are different between two binary vectors.

The Hamming distance is similar to the SMC in which both methods look at the whole data and looks for when data points are similar and dissimilar. The Hamming distance gives the number of bits that are different whereas the SMC gives the result of the ratio of how many bits were the same over the entirety of the sample set. In a nutshell, Hamming distance reveals how many were different, SMC reveals how many were same, and therefore one reveals the inverse information of the other.

SMC = Hamming distance / number of bits

3. The statement “if an itemset is frequent then all of its subsets must also be frequent” describes _________ .

A. Unique item property
B. Downward closure property
C. Apriori property
D. Contrast set learning

Answer: (b) Downward closure property and (c) Apriori property

The Apriori property state that if an itemset is frequent then all of its subsets must also be frequent.

Apriori algorithm is a classical data mining algorithm used for mining frequent itemsets and learning of relevant association rules over relational databases.

Apriori property expresses monotonic decrease of an evaluation criterion accompanying with the progress of a sequential pattern.

Both downward closure property and Apriori property are synonyms to each other.

4. Prediction differs from classification in which of the following senses?

A. Not requiring a training phase
B. The type of the outcome value
C. Using unlabeled data instead of labeled data
D. Prediction is about determining a class

Answer: (b) The type of the outcome value

The type of outcome values of prediction differs from that of classification.

Predicting class labels is classification, and predicting values (e.g. using regression techniques) is prediction.

Classification is the process of identifying the category or class label of the new observation to which it belongs. Predication is the process of identifying the missing or unavailable numerical data for a new observation.

5. The statement “if an itemset is infrequent then it’s superset must also be an infrequent set” denotes _______.

A. Maximal frequent set
B. Border set
C. Upward closure property
D. Downward closure property

Answer: (c) Upward closure property

Any subset of a frequent item set must be frequent (downward closure property) or any superset of an infrequent item set must be infrequent (Upward closure property). Both are Apriori properties.

**********************

What are the various properties under Apriori algorithm?

Define upward closure and downward closure properties

Difference between classification and prediction

Which distance metric is similar to simple matching coefficient

How different Manhattan and Euclidean distances are from Minkowski distance

Machine learning algorithms MCQ with answers

Machine learning question banks and answers

By K Saravanakumar Vellore Institute of Technology - October 12, 2020 No comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Data mining quiz questions, Data warehousing quiz questions

Older Posts Home

Please visit, subscribe and share 10 Minutes Lectures in Computer Science

All time most popular contents

Relational algebra in database management systems solved exercise

Relational algebra in database management systems solved exercise Relational algebra – solved exercise Question: Consider the fo...
20 MCQs on Popular AI Tools (ChatGPT, Gemini, Claude, Copilot) with Answers - Practice quiz

✔ Scroll down and test yourself — answers are hidden under the “View Answer” button. ☰ Quick Links - Br...
Machine Learning Multiple Choice Questions and Answers Home

Top 5 Machine Learning Quiz Questions with Answers explanation, Interview questions on machine learning, quiz questions for data scienti...
Probabilistic Context Free Grammar (PCFG)

Natural Language Processing with Transformers A practical guide to modern NLP architectures using Hugging Face Transformers...
Normalization - solved exercises - Normal forms 1

Set of solved exercises in Normalization / Normalization Solved Examples / How to find candidate keys, and primary keys in database? /...

📌 Quick links

DBMS SQL DDB ML DL NLP DSA

Translate

Labels

1NF 2NF 2PC Protocol ACID ADT Question Paper AI definitions AI tools Advanced Database Questions Android Apps for Databases Anna University DBMS Questions Anna University Exam Questions Bayes theorem CS2255 DBMS Questions CYK algorithm Closure Cochin University Codd's 12 Rules Computer Networks MCQ Concurrency Control DBMS Question Bank DBMS Question Paper DCL DDL DML Data Science Insights Data Structures Data Structures Programs Data Structures Quiz Data mining quiz questions Data warehousing quiz questions Database Languages Database Performance Database Quizzes Database Technology Database anomalies Date's 12 Rules Deadlock Decision Tree Deep Learning Dependency Preservation Disk Storage Access Exercises Distributed Database Distributed Database Question Bank Distributed Database Questions Distributed Database Quiz Distributed Lock Manager Distributed Transaction ER Model File organization Fragmentation Gradient descent Graph theory HMM Hidden Markov Model Indexing Information Retrieval Information Retrieval MCQ Intraoperation Parallelism JNTU DBMS Question JNTU Exam Keywords and Definitions Lemmatization MCA Question Paper MCQ MCQ quiz questions in computer science ME Question Paper MLE Machine Learning Quiz Minimal cover NLP NLP CFG NLP Question Bank NLP Quiz Questions NLP solved exercise Naive bayes classifier Neural network Normal Forms Normalization OS OS Exam Questions OS Question Bank Object Databases Operating System Oracle Parallel Database Parallel Database Concepts Photographs Previous Year Questions Pune University Questions Query Processing Question Bank RDBMS quiz questions Real Time Database SMTP PUSH Protocol SQL SQL Exercise SQL cheat sheet Serializability Set Operators Social Network Analysis Softmax activation function Solved Exercises Stemming Syntactic parsing TCL Transaction Transaction Management Trigger Two Mark Questions Visvesvaraya Technological University add-1 smoothing b+ tree indexing blind writes cheat sheet clustering communication protocol computer networks conflict serializability cross validation data preprocessing data science data structures cheat sheet database join database mcq deadlock detection dependency parser ensemble learning functional dependency index evaluation k-fold cross validation language model machine learning machine learning question bank natural language processing normalization questions query cost query optimization recoverable schedules timestamp ordering protocol video lectures wait-for graph

Vellore Institute of Technology

142nd in the World and 9th in India (QS World University Rankings by Subject 2025)
Data Science and AI subject areas are within the Top 100 in the world.
Within the top 2 in India and top 600 in the world (Shanghai ARWU ranking 2025)
NAAC Accreditation with A++ grade (3.66 out of 4).
396th in the world and 8th in India ( (QS World University Rankings : Sustainability 2025).

Followers

Contributors

K Saravanakumar Vellore Institute of Technology
Saravanakumar Kandasamy

Report Abuse

Disclaimer

Dear readers, though most of the content of this site is written by the authors and contributors of this site, some of the content are searched, found and compiled from various other Internet sources for the benefit of readers.

Major links

Quicklinks

Monday, October 12, 2020

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 01

4. The difference between supervised learning and unsupervised learning is given by:

5. Which of the following is used to find inherent regularities in data?

Related links:

What are the applications of frequent pattern analysis

Difference between supervised and unsupervised learning

What is curse of dimensionality

Why the distance between data points are meaningless in high dimensional spaces?

Application of outlier analysis is to detect fraudulent credit card usage

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 05

1. Which of the following best describes the sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyper-parameters?

Validation dataset is the sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyper-parameters.

Validation dataset is different from test dataset.

Data warehouse handles summarized (aggregated) data that are aggregated from OLTP systems.

As per definition of Ralph Kimball, a data warehouse is “a copy of transaction data specifically structured for query and analysis.”

4. Classification is a data mining task that maps the data into _________ .

k-nearest neighbor (knn), naïve bayes and support vector machine (svm) are few of the classification algorithms.

This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

Related links:

Which of the clustering technique works in bottom-up approach

List few classification algorithms

What type of data used by data warehouse

Difference between OLAP and OLTP

How validation set is different from test set and training set

Validation dataset is used for parameter selection and avoid overfitting

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 04

1. Minkowski distance is a function used to find the distance between two

If p=1 then L1 which is Manhattan distance (change p with 1 in above equation)

If p=2 then L2 which is Euclidean distance (change p with 2 in above equation)

Hamming distance is the number of bits that are different between two binary vectors.

The Apriori property state that if an itemset is frequent then all of its subsets must also be frequent.

Both downward closure property and Apriori property are synonyms to each other.

The type of outcome values of prediction differs from that of classification.

Predicting class labels is classification, and predicting values (e.g. using regression techniques) is prediction.

Related links:

What are the various properties under Apriori algorithm?

Define upward closure and downward closure properties

Difference between classification and prediction

Which distance metric is similar to simple matching coefficient

How different Manhattan and Euclidean distances are from Minkowski distance

Machine learning algorithms MCQ with answers

Machine learning question banks and answers

Featured Content

All time most popular contents