Please visit, subscribe and share 10 Minutes Lectures in Computer Science

# Find the TF-IDF of terms of a given document and a collection of documents, how to calculate tf-idf, the use of tf-idf in finding the importance of a term in a document, term frequency-inverse document frequency

Question:

Given a document X containing terms t1, t2 and t3 with frequencies (inside brackets) as follows;

t1(3), t2(2), t3(1)

Let us assume that the collection contains 10,000 documents and document frequencies of these terms are as follows;

t1(50), t2(1300), t3(250)

Then, find the TF-IDF weight of terms t1, t2, and t3 in the document X.

Solution:

TF-IDF (Term Frequency-Inverse Document Frequency) is a measure to calculate “how relevant a term is in a given document”.

TFt,d counts the number of times a term t occurs in a document d. It can be calculated as follows;

For example, if the document D1 contains the term ‘quick’ 10 times, and it has 54 words in it, then the TF’quick’, D1 = 10/54 = 0.19.

DFt refers to the number of documents in which t presents.

For example, if 120 documents consist of the word ‘quick’, then the DF’quick’ = 120.

IDFt is the inverse measure used to calculate the informativeness of the given term t. This means, how common or rare a word is in the entire document set. The closer it is to 0, the more common a word is. This can be calculated as follows;

Here, N is the number of documents in the given collection, and DFt is the document frequency of term t.

The TF-IDF weight of a term is the product of its TF weight and its IDF weight.

TF-IDF for term t1;

TFt1 = (number of times t1 occurs in X)/(number of words in X) = 3/3

IDFt1 = log(No. of docs in the collection/No. of docs t1 appears) = log(10000/50) = 5.3

TF-IDF for t1 = 5.3

TF-IDF for term t2;

TFt2 = 2/3

IDFt2 = log (10000/1300) = 2.0

TF-IDF for t2 = 1.3

TF-IDF for term t3;

TFt3 = 1/3

IDFt3 = log (10000/250) = 3.7

TF-IDF for t3 = 1.23

******************

## Featured Content

### Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

data recovery