## List down the issues/problems with Jaccard similarity.

The Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It is calculated as follows;

Jaccard (A, B) = |A ∩ B|/|A U B|

The number of common members in two sets divided by the total number of members in both the two sets is the Jaccard coefficient. It can be a value between 0 and 1 where 0 indicates no overlap and 1 indicates perfect overlap.

Problems with Jaccard

• It doesn’t consider term frequency (how many times a term occurs in a document). It simply counts the number of terms that are common between two sets.
• Rare terms in a collection are more informative than frequent terms. Jaccard doesn’t consider this information.
• Different sized sets with same number of common members also will result in the same Jaccard similarity.