Showing posts with label clustering. Show all posts
Showing posts with label clustering. Show all posts

Sunday, February 19, 2023

Machine Learning MCQ - Single linkage and complete linkage hierarchical clustering

Multiple choices questions in Machine learning. Interview questions on machine learning, quiz questions for data scientist answers explained, Exam questions in machine learning, hierarchical clustering, agglomerative clustering, single linkage cluster distance, complete linkage cluster distance, single-link vs complete-link distance calculations

Machine Learning MCQ - Distance between points in single linkage and complete linkage hierarchical clustering methods

< Previous                      

Next >

 

1. Considering single-link and complete-link hierarchical clustering, is it possible for a point to be closer to the points in other clusters than to the points in its own cluster? If so, in which approach will this tend to be observed?

a) No

b) Yes, single-link clustering

c) Yes, complete-link clustering

d) Yes, both single-link and complete-link clustering

 

Answer: (d) Yes, both single-link and complete-link clustering

This is possible in both single-link and complete-link clustering. In the single-link case, an example would be two parallel chains where many points are closer to points in the other chain/cluster than to points in their own cluster. In the complete-link case, this notion is more intuitive due to the clustering constraint (measuring distance between two clusters by the distance between their farthest points).

 

What is single link clustering?

Single link clustering is one of the hierarchical clustering methods.

In single linkage (i.e., nearest-neighbor linkage or sometimes referred as MIN), the dissimilarity between two clusters is the smallest dissimilarity between two points in
opposite groups.

In other words, in single linkage clustering, the inter-cluster distance (the distance between two clusters) is represented by the
distance of the closest pair of data objects belonging to
different clusters.

cluster distance = distance of two closest members in each class

 

What is complete link clustering?

Yet another hierarchical clustering method.

In complete linkage (i.e., furthest-neighbor linkage or MAX), dissimilarity
between two clusters is the largest dissimilarity between two points in
opposite groups.

In other words, in complete linkage clustering, the inter-cluster distance (the distance between two clusters) is represented by the
distance of the farthest pair of data objects belonging to
different clusters.

cluster distance = distance of two farthest members

 

What is the term linkage refers to in hierarchical clustering?

 The choice of linkage determines how we measure dissimilarity between groups of points.

 

< Previous                      

Next >

 

 

************************

Related links:

What is hierarchical clustering?

Difference between single-linkage and complete-linkage hierarchical clustering in ML

What is single link clustering?

What is complete link clustering?

What does linkage refer to in hierarchical agglomerative clustering?

Machine learning solved mcq, machine learning solved mcq


Saturday, April 30, 2022

Machine Learning MCQ - Best method to find optimal number of clusters (k) in k-means algorithm

Multiple choices questions in Machine learning. Interview questions on machine learning, quiz questions for data scientist answers explained, How to find optimal k value in k-means? Elbow method vs silhouette method, Which is the best method to find optimal number of clusters in k-means?

Machine Learning MCQ - Which is the method to find optimal number of clusters (k value) in k-means algorithm?

< Previous                      

Next >

 

1. K-means is an unsupervised learning algorithm. In K-means, k refers to the number of clusters. We have several methods to find the optimal number of clusters in K-means algorithm. Which of the following methods can give optimal (best) number of clusters?

a) Manhattan method

b) Elbow method

c) Euclidean method

d) Silhouette method

Answer: (d) Silhouette method

The silhouette method for finding optimal k value in k-means

The silhouette Method is a method to find the optimal number of clusters and interpretation and validation of consistency within clusters of data. The silhouette method computes silhouette coefficients of each point that measure how much a point is similar to its own cluster compared to other clusters by providing a succinct graphical representation of how well each object has been classified.

Compute silhouette coefficients for each of point, and average it out for all the samples to get the silhouette score. [For more please refer here]

 

Why elbow method is not chosen as the better method over silhouette method in finding the best value for k in k-means?

Usually elbow curve method is a little ambiguous as the bend point for some datasets is not visible clearly.

 

Difference between Elbow and Silhouette methods

Metrics

Elbow method

Silhouette method

Calculation

Calculates the Euclidean distance

Considers variables such as variance, skewness, etc.

Dataset size

Works well for smaller datasets

It is a better option for higher-dimensional data

Effect of duplicate data

May not give proper output in case of duplicate data available

Works better and identifies duplicate data

Efficacy

Efficiency depends on the nature of the dataset

Does not depend on the nature of the dataset

Finding k

The elbow method is used to find the “elbow” point, where adding additional data samples does not change cluster membership much. [Refer here for more]

Silhouette score determines whether there are large gaps between each sample and all other samples within the same cluster or across different clusters.


 

 

  

< Previous                      

Next >

 

************************

Related links:

What metric can be used to find optimal number of clusters?

Differentiate between elbow method and silhouette method to find optimal clusters

Find optimal number of clusters using k-means algorithm

Why silhouette method is better than elbow method in finding optimal number of clusters in k-means algorithm?

Among elbow and silhouette methods, which is good if the data is of high-dimensions?

Machine learning solved mcq, machine learning solved mcq

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery