# Find the TF-IDF of terms of a given document and a collection of documents, how to calculate tf-idf, the use of tf-idf in finding the importance of a term in a document, term frequency-inverse document frequency

__Question:__

Given a document X containing terms t1, t2 and t3 with frequencies (inside brackets) as follows;

t1(3), t2(2), t3(1)

Let us assume that the collection contains 10,000 documents and document frequencies of these terms are as follows;

t1(50), t2(1300), t3(250)

Then, find the TF-IDF weight of terms t1, t2, and t3 in the document X.

__Solution:__

TF-IDF (Term Frequency-Inverse Document Frequency) is a measure to calculate “how relevant a term is in a given document”.

TF_{t,d} counts the number of
times a term t occurs in a document d. It can be calculated as follows;

For example, if the document D1 contains the term ‘quick’ 10 times, and it has 54 words in it, then the TF

_{’quick’, D1}= 10/54 = 0.19.

DF_{t} refers to the number of
documents in which t presents.

For example, if 120 documents consist
of the word ‘quick’, then the DF_{’quick’} = 120.

IDF_{t} is the inverse measure
used to calculate the informativeness of the given term t. This means, how
common or rare a word is in the entire document set. The closer it is to 0, the
more common a word is. This can be calculated as follows;

Here, N is the number of documents in the given collection, and DF

_{t}is the document frequency of term t.

The TF-IDF weight of a term is the product of its TF weight and its IDF weight.

__TF-IDF for term t1;__

TF_{t1} = (number of times t1
occurs in X)/(number of words in X) = 3/3

IDF_{t1} = log(No. of docs in
the collection/No. of docs t1 appears) = log(10000/50) = 5.3

**TF-IDF
for t1 = 5.3**

__TF-IDF for term t2;__

TF_{t2} = 2/3

IDF_{t2} = log (10000/1300) =
2.0

**TF-IDF
for t2 = 1.3**

__TF-IDF for term t3;__

TF_{t3} = 1/3

IDF_{t3} = log (10000/250) = 3.7

**TF-IDF
for t3 = 1.23**

******************