Monday, June 14, 2021

Calculate the TF-IDF weight for tems of a given document

Find the TF-IDF of terms of a given document and a collection of documents, how to calculate tf-idf, the use of tf-idf in finding the importance of a term in a document, term frequency-inverse document frequency

Question:

Given a document X containing terms t1, t2 and t3 with frequencies (inside brackets) as follows;

t1(3), t2(2), t3(1)

Let us assume that the collection contains 10,000 documents and document frequencies of these terms are as follows;

t1(50), t2(1300), t3(250)

Then, find the TF-IDF weight of terms t1, t2, and t3 in the document X.

 

Solution:

TF-IDF (Term Frequency-Inverse Document Frequency) is a measure to calculate “how relevant a term is in a given document”.

TFt,d counts the number of times a term t occurs in a document d. It can be calculated as follows;


For example, if the document D1 contains the term ‘quick’ 10 times, and it has 54 words in it, then the TF’quick’, D1 = 10/54 = 0.19.

DFt refers to the number of documents in which t presents.

For example, if 120 documents consist of the word ‘quick’, then the DF’quick’ = 120.

IDFt is the inverse measure used to calculate the informativeness of the given term t. This means, how common or rare a word is in the entire document set. The closer it is to 0, the more common a word is. This can be calculated as follows;


Here, N is the number of documents in the given collection, and DFt is the document frequency of term t.

The TF-IDF weight of a term is the product of its TF weight and its IDF weight.

TF-IDF for term t1;

TFt1 = (number of times t1 occurs in X)/(number of words in X) = 3/3

IDFt1 = log(No. of docs in the collection/No. of docs t1 appears) = log(10000/50) = 5.3

TF-IDF for t1 = 5.3

 

TF-IDF for term t2;

TFt2 = 2/3

IDFt2 = log (10000/1300) = 2.0

TF-IDF for t2 = 1.3

 

TF-IDF for term t3;

TFt3 = 1/3

IDFt3 = log (10000/250) = 3.7

TF-IDF for t3 = 1.23

 

******************

 

how to find term frequency? how to find inverse document frequency? how to calculate tf-idf weight? what is the importance of tf-idf weights? solved exercise in information retrieval, define term frequency, define document frequency, define inverse document frequency


Sunday, June 13, 2021

Find the average precision of a retrieval system - Information retrieval exercise

Find average precision of a retrieval system, information retrieval system evaluation, how to find the average precision of a search engine, information retrieval systems solved exercises

Question:

An information retrieval system give us the following result (10 documents) where the green colored are relevant ones and the red colored are non-relevant. The numbers indicate the rank of the document. Find the average precision of the given system.

 1

 2

 3

 4

 5

 6

 7

 8

 9

 10

 

Solution:

Precision = Fraction of retrieved documents that are relevant.

Precision = No. of relevant items retrieved/No. of retrieved items

Precision@k = Fraction of retrieved documents that are relevant in the top k documents.

Precision@k = No. of relevant docs in the top 5 results / k

Precision@5 = 3/5 = 0.6

Precision values @k is calculated and shown in the table below for all documents.

Rank

1

 2

 3

 4

 5

 6

 7

 8

 9

 10

Precision

1.0

0.5

0.33

0.5

0.6

0.5

0.43

0.5

0.44

0.5

 

We need to compute Precision@k for each of the k values in the ranked document list. Then we can calculate the average precision as follows;

Average precision = average of precision@k


Here, rel(k) is 1 if the document is relevant, otherwise 0. Hence, we can add the relevant documents precision@k and calculate the average precision as follows;

Average precision = (1.0 + 0.5 + 0.6 + 0.5 + 0.5)/5 = 0.62

Average precision for the retrieved documents = 0.62

 

************

 

How to calculate the average precision of a search result?

What is average precision?

What is precision@k?

How to calculate precision@k? 

How to evaluate an information retrieval system?

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents