Please visit, subscribe and share 10 Minutes Lectures in Computer Science

# List the issues with Jaccard similarity coefficient, What are the problems with Jaccard index, Define Jaccard index, Can we use Jaccard index to find similarity between two documents

Question:

## List down the issues/problems with Jaccard similarity.

The Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It is calculated as follows;

Jaccard (A, B) = |A ∩ B|/|A U B|

The number of common members in two sets divided by the total number of members in both the two sets is the Jaccard coefficient. It can be a value between 0 and 1 where 0 indicates no overlap and 1 indicates perfect overlap.

Problems with Jaccard

• It doesn’t consider term frequency (how many times a term occurs in a document). It simply counts the number of terms that are common between two sets.
• Rare terms in a collection are more informative than frequent terms. Jaccard doesn’t consider this information.
• Different sized sets with same number of common members also will result in the same Jaccard similarity.