*1. In classification, what does high entropy mean? *

*2. Which clustering approach is the best to produce clusters of different sizes and shapes? **
*3. Which type of machine learning algorithms would be helpful to predict amount of rainfall in a region? *

*4. In linear regression,which regularization penalties can be used to reduce some parameters to zero? **
*5. Why the MLE estimates are often considered as undesirable? *

*6. If you have enormous amount of training data, what would be the variance of your model trained on that data? *

*7. In a neural network, if you have more neurons in the hidden layers, what would be the impact? *

*8. What will be the problem with the decision tree that is too shallow?**
*9. How do we refer to a machine learning model that neither model the training data nor generalize to new unseen data? *

*10. When can hard margin SVM work?*

*11. What is the objective of K-means clustering?*

*12. How does polynomial degree affect the overfitting and underfitting in polynomial regression?*

*13. When does bias become low or high in polynomial regression?*

*14. How does the use of weak classifiers help preventing overfitting when perform bagging?*

*15. How does “the averaging the output of multiple decision tree” help?*

*16. What are some assumptions made by k-means algorithm?*

*17. What is an activation function in neural networks?*

*18. How does the kernel width affect the trade-off between underfitting and overfitting in kernel regression?*

*19. Why do we need to prune a decision tree?*

*20. Discuss on bias and variance tradeoff.*

*21. What is the cause of increase in training loss with number of epochs?*

*22. Why does decision tree
algorithm can achieve zero training error on any linearly separable dataset?** *

*23. Why does perceptron can achieve zero training error on any linearly separable dataset?**
*24. List down different ways to reduce the overfitting problem.*

*25. Which is the most suitable error function for gradient descent using logistic regression?*

*26. When do we use linear kernel while training an SVM?**
*27. What is the impact of increasing the size of layers in a neural network on bias and variance?*

*28. How does the increase in the number of hidden units per layer in a neural network affect bias and variance?*

*29. How does pruning help in the development of a decision tree?*

*30. What is the major weakness of decision trees when compared with logistic regression classifiers?*

*31. Why decision tree is not an ensemble machine learning method? *

*32. What is sequential ensemble?*

*33. Differentiate between sequential and parallel ensemble models.*

#### 34. *What will be the effect of k-nearest neighbor model on bias and variance if we increase the value of k?*

#### 35. *What would be the bias-variance tradeoff if we increase the value of k in k-nearest neighbor?*

#### 36. *Will k-nearest neighbor model underfit or overfit for a large value of k?*

#### 37. *Differentiate between underfit and overfit in machine learning*.

#### 38. *What is the impact of multi-way split in decision tree learning?*

*39.
Why do both the perceptron and the linear SVM have same VC dimension? *

*40. Does the selection of decision tree split that minimizes the classification error guarantee an optimal tree?*

*41. What is called the“curse of dimensionality”?*

*42. Models that are too complex tend to have high variance and low bias. Why?*

**43. What are some of* the
popular discriminative models to solve classification problems?

45. How does Hamming distance is different from other distance measures Euclidean and Manhattan? *

46. Why Leave-One-Out cross validation is not suitable for very large datasets?

*47. What is cross validation used for?*

48. Which cross validation technique is working fast on very large datasets?

50. What are some applications of unsupervised learning? *

*51.
Which method is used to find the optimal clusters in k-means clustering algorithm? *

