Showing posts with label Machine Learning Quiz. Show all posts
Showing posts with label Machine Learning Quiz. Show all posts

Monday, January 11, 2021

Machine Learning TRUE or FALSE Questions with Answers 19

Machine learning exam questions, ML solved quiz questions, Machine Learning TRUE or FALSE questions, TOP 5 machine learning quiz questions with answers

Machine Learning TRUE / FALSE Questions - SET 19

1. Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian RBF Kernel) might lead to overfitting.

(a) TRUE                                                   (b) FALSE

Answer: TRUE

When there are outliers, hard margin SVM + Gaussian-RBF kernel result in an unnecessarily complicated decision boundary that overfits the training noise.

In SVM, to avoid overfitting, we choose a Soft Margin, instead of a Hard margin, i.e. we let some data points enter our margin intentionally so that our classifier don’t overfit on our training sample.

SVM is less prone to overfitting than other methods.

[Refer here for more]

 

2. Random forests can be used to classify infinite dimensional data.

(a) TRUE                                                   (b) FALSE

Answer: TRUE

Random forest is great with high dimensional data since we are working with subsets of data. With Random Forests there’s almost no harm in keeping columns whose importance is not certain and no harm in adding more columns.

But, random forests do not have high performance when dealing with very-high-dimensional data

 

3. The training accuracy increases as the size of the tree grows (assuming no noise).

(a) TRUE                                                   (b) FALSE

Answer: TRUE

The training accuracy increases as the size of the tree grows until the tree fits all the training data.

A decision tree overfits the training data when its accuracy on the training data goes up but its accuracy on unseen data goes down.

 

4. Hierarchical clustering methods require a predefined number of clusters, much like k-means.

(a) TRUE                                                   (b) FALSE

Answer: FALSE

We do not need to predefine the number of clusters in hierarchical clustering like we do in k-means clustering. Hierarchical clustering considers each data point as individual cluster and groups similar objects into clusters.

 

5. Suppose that X1, X2, ..., Xm are categorical input attributes and Y is categorical output attribute. Suppose we plan to learn a decision tree without pruning, using the standard algorithm. The maximum depth of the decision tree must be less than m+1.

(a) TRUE                                                   (b) FALSE

Answer: TRUE

Because the attributes are categorical and can each be split only once.

 

*********************

Related links:

 

Decision tree

Overfitting in decision tree

Random forest

Support vector machine

Wednesday, December 30, 2020

Machine Learning Multiple Choice Questions and Answers 24

Top 5 Machine Learning Quiz Questions with Answers explanation, Interview questions on machine learning, quiz questions for data scientist answers explained, machine learning exam questions, question bank in machine learning, classification, ridge regression, lasso regression, statistics


Machine learning Quiz Questions - Set 24

1. The classifier’s behavior is determined by the coefficients. These coefficients are usually referred as ________.

a) Weights

b) Tasks

c) Values

d) Behaviors

Answer: (a) Weights

The classifier’s behavior is determined by the coefficients, wi.These are usually called weights.

 

2. Null and alternative hypotheses are statements about:

a) population parameters.

b) sample parameters.

c) sample statistics.

d) it depends - sometimes population parameters and sometimes sample statistics.

Answer: (a) Population parameters

The null and alternative hypotheses are two mutually exclusive statements about a population. A hypothesis test uses sample data to determine whether to reject the null hypothesis.

Null hypothesis (H0) - The null hypothesis states that a population parameter (such as the mean, the standard deviation, and so on) is equal to a hypothesized value.

Alternative Hypothesis (H1) - The alternative hypothesis states that a population parameter is smaller, greater, or different than the hypothesized value in the null hypothesis.[Refer for more]

 

3. In hypothesis testing, a Type 2 error occurs when

a) The null hypothesis is not rejected when the null hypothesis is true.

b) The null hypothesis is rejected when the null hypothesis is true.

c) The null hypothesis is not rejected when the alternative hypothesis is true.

d) The null hypothesis is rejected when the alternative hypothesis is true.

Answer: (c) The null hypothesis is not rejected when the alternative hypothesis is true

Type 2 error is caused when the null hypothesis is false and we fail to reject it.

 

4. What type of penalty is used on regression weights in Ridge regression?

a) L0

b) L1

c) L2

d) None of the above

Answer: (c) L2

Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function. L2 regularization adds an L2 penalty, which equals the square of the magnitude of coefficients.

Ridge regression shrinks the regression coefficients, so that variables, with minor contribution to the outcome, have their coefficients close to zero.

The shrinkage of the coefficients is achieved by penalizing the regression model with a penalty term called L2-norm, which is the sum of the squared coefficients. L2 regularization is used to avoid overfitting of data.

When do we use L2 regularization?

L2 regularization is best used in non-sparse outputs, when no feature selection needs to be done, or if you need to predict a continuous output.

 

5. Which of the following of the coefficient is added as the penalty term to the loss function in Lasso regression?

a) Squared magnitude

b) Absolute value of magnitude

c) Number of non-zero entries

d) None of the above

Answer: (b) Absolute value of magnitude

Lasso regression adds “absolute value of magnitude” of coefficient as penalty term to the loss function.

Lasso regression shrinks the regression coefficients toward zero by penalizing the regression model with a penalty term called L1-norm, which is the sum of the absolute coefficients.

 

**********************

Related links:

Multiple choice quiz questions in machine learning

Differentiate between Lasso and ridge regression

When do we need L2 regularization technique

Define Type 2 error

What does null and alternative hypotheses state about population parameters


Saturday, December 5, 2020

Machine Learning TRUE or FALSE Questions with Answers 18

Machine learning exam questions, ML solved quiz questions, Machine Learning TRUE or FALSE questions, TOP 5 machine learning quiz questions with answers

Machine Learning TRUE / FALSE Questions - SET 18

1. For linearly separable data, can a small slack penalty (“C") hurt the training accuracy when using a linear SVM without kernel.

(a) TRUE                                                   (b) FALSE

Answer: TRUE

If the optimal values of α's (say in the dual formulation) are greater than C, we may end up with a sub-optimal decision boundary with respect to the training examples. Alternatively, a small C can allow large slacks, thus the resulting classifier will have a small value of w2 but can have non-zero training error.

 

C is a regularization parameter that controls the trade-off between the achieving a low training error and a low testing error that is the ability to generalize your classifier to unseen data. If your C is too small then you give your objective function a certain freedom to increase |w| a lot, which will lead to large training error.

C Parameter is used for controlling the outliers — low C implies we are allowing more outliers, high C implies we are allowing fewer outliers.

 

2. Ridge regression, weight decay, and Gaussian processes use the same regularizer.

(a) TRUE                                                   (b) FALSE

Answer: TRUE

Ridge regression, weight decay, and Gaussian processes use the same regularizer ǁwǁ2.

Regularization

In the context of machine learning, regularization is the process which regularizes or shrinks the coefficients towards zero. In simple words, regularization discourages learning a more complex or flexible model, to prevent overfitting. [For more, refer here please]

Regularization may be defined as any change we make to the training algorithm in order to reduce the generalization error but not the training error.

Ridge regression is like least-square regression with an additional penalty term ǁwǁ2.

Weight decay means decreasing the weights at every learning step.

A Gaussian process is a generative model in which the weights of the target function are drawn according to a Gaussian distribution (for a linear model).

 

3. Linear soft-margin SVM can only be used when training data are linearly separable.

(a) TRUE                                                   (b) FALSE

Answer: FALSE

Hard margin SVM can work only when data is completely linearly separable without any errors (noise or outliers). In case of errors either the margin is smaller or hard margin SVM fails. On the other hand soft margin SVM was proposed to solve this problem by introducing slack variables. It is an extended version of hard-margin SVM

 

4. In linear regression, using an L2 regularization penalty term results in sparser solutions than using an L1 regularization penalty term.

(a) TRUE                                                   (b) FALSE

Answer: FALSE

In linear regression, using an L1 regularization penalty term results in sparser solutions than using an L2 regularization penalty term.

 

L1 regularization adds an L1 penalty equal to the absolute value of the magnitude of coefficients. In other words, it limits the size of the coefficients. L1 can yield sparse models (i.e. models with few coefficients).

L2 regularization adds an L2 penalty equal to the square of the magnitude of coefficients. L2 will not yield sparse models and all coefficients are shrunk by the same factor. [For more, please refer here] 

 

5. Maximum likelihood estimation gives us not only a point estimate, but a distribution over the parameters that we are estimating.

(a) TRUE                                                   (b) FALSE

Answer: FALSE

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. [Refer here]

MLE is a method of estimating the parameters of a statistical model by picking the parameters that maximize the likelihood function.

 

*********************

Related links:

 

Maximum Likelihood Estimation

L1 and L2 regularization

Difference between hard-margin and soft-margin SVM

Regularization in ridge regression

What is slack variable

Differentiate between L1 and L2 regularization 

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery