✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.
Attempt all questions first.
✔️ Click SUBMIT at the end to unlock VIEW ANSWER buttons.
Choosing the Right Machine Learning Algorithm – Real-World MCQs
Selecting the correct machine learning algorithm is a critical step in solving real-world data science problems. The choice depends on factors such as data type, problem objective, labeled vs unlabeled data, and output nature.
In this quiz, you will explore scenario-based MCQs using real-life datasets from domains such as real estate, e-commerce, banking, healthcare, recommendation systems, and time-series forecasting. These questions are commonly asked in university exams, ML interviews, and competitive tests.
Topics covered include:
- Regression vs Classification problems
- Supervised vs Unsupervised learning
- Clustering and Customer Segmentation
- Recommendation Systems
- Time-Series Forecasting
- Dimensionality Reduction
Each question includes difficulty level, data type, and clear explanations to help you understand why a particular ML algorithm is the best choice.
1. A real-estate company wants to predict house prices in Bangalore using features such as area (sq.ft), number of bedrooms, location, and age of the building. The target value is continuous.
Difficulty: Easy
Data Type: Labeled, Continuous Target
Linear Regression is ideal for predicting continuous numerical values.
Why not others? Logistic Regression is for classification, K-Means is unsupervised, and Apriori is for association rules.
2. An email service like Gmail wants to classify emails as Spam or Not Spam using word frequencies and sender information.
Difficulty: Easy
Data Type: Labeled, Text Data
Naive Bayes works well for probabilistic text classification problems.
Why not others? K-Means is unsupervised and PCA is for dimensionality reduction.
3. Amazon wants to group customers based on purchase history, spending behavior, and browsing activity for marketing purposes.
Difficulty: Medium
Data Type: Unlabeled, Numerical Features
K-Means clusters similar customers without requiring labeled data.
Why not others? Classification algorithms require predefined labels.
4. A bank wants to detect fraudulent credit-card transactions where fraud cases are rare compared to normal transactions.
Difficulty: Interview-level
Data Type: Labeled, Imbalanced Dataset
Random Forest handles non-linearity and class imbalance effectively.
Why not others? Linear Regression cannot model classification boundaries.
5. Netflix wants to recommend movies based on users’ viewing history and ratings from similar users.
Difficulty: Medium
Data Type: Labeled User–Item Interactions
Collaborative Filtering leverages similarities among users or items.
Why not others? Regression models do not capture preference similarity.
6. A telecom company wants to predict whether a customer will churn based on usage patterns and complaint history.
Difficulty: Easy
Data Type: Labeled, Binary Target
Logistic Regression is designed for binary classification problems.
Why not others? PCA reduces features but does not classify.
7. A system must recognize handwritten digits (0–9) from the MNIST image dataset.
Difficulty: Medium
Data Type: Labeled Image Data
CNNs learn spatial features crucial for image recognition.
Why not others? Traditional ML models cannot exploit image structure.
8. Walmart wants to forecast next month’s product sales using historical daily sales data.
Difficulty: Medium
Data Type: Time-Dependent Numerical Data
ARIMA models temporal dependencies in sequential data.
Why not others? K-Means ignores time ordering.
9. A supermarket wants to identify products that are frequently purchased together.
Difficulty: Easy
Data Type: Transactional Data
Apriori discovers association rules from transaction records.
Why not others? Classification models do not find item associations.
10. A dataset contains 1,000 features, and the goal is to reduce dimensionality before training a model.
Difficulty: Easy
Data Type: High-Dimensional Numerical Data
PCA reduces features while preserving maximum variance.
Why not others? K-Means clusters data but does not reduce dimensions.