Sunday, July 4, 2021

Multiple Choice Questions with Answers in Information Retrieval SET 3

Top 5 quiz questions in IR, Information retrieval quiz, information retrieval mcqs with answers, information retrieval,  inverted index, zipf's law, fallout measure, term frequency, formal definition of information retrieval system

Information Retrieval MCQs - SET 03

1. A data structure that maps terms back to the parts of a document in which they appear is called

a) Lexicon

b) Dictionary

c) Inverted index

d) All of the above

Click here to view answer and explanation


 

2. How the information retrieval problem can be defined formally?

a) a triple

b) a quadruple

c) a couple

d) None of the above

Click here to view answer and explanation


 

3. The count of occurrences of a word in a document is referred as

a) document frequency

b) term frequency

c) collection frequency

d) change frequency

Click here to view answer and explanation


 

4. Suppose the frequency of the most frequent word in a corpus of Tamil documents is 10000. What would be the estimated frequency of second most frequent in the given corpus as per Zipf’s law?

a) 10000

b) 2500

c) 5000

d) Cannot be determined

Click here to view answer and explanation


 

5. The proportion of non-relevant items that has been retrieved in a given search is

a) Precision

b) Recall

c) Generality

d) Fallout

Click here to view answer and explanation


 

********************

 

Related links

 

Keywords

For what values of fallout ratio, we would say that the IR system is good?

Formal definition of information retrieval system as a quadruple

How to find the frequency of a second most frequent word using Zipf's law?

What is an inverted index? how to construct inverted index?

Saturday, July 3, 2021

Common preprocessing steps and their significance in information retrieval

common preprocessing steps used in information retrieval task, Significance of preprocessing in information retrieval, All you need to know about text preprocessing in information retrieval

Question:

What are the common preprocessing steps used in information retrieval task?

 

Answer:

 

Preprocessing technique

How?

Benefits

Extract root words

* Stemming (Rule-based, dictionary based, corpus based)

* Lemmatization

1. Improves recall

2. Indexing size reduced

Stop words removal

Stop word list can be used

1. Improves efficiency of retrieval

2. Indexing size reduced

Tokenization (break sentences into tokens/keywords)

Typical solution is to split a sentence at non-letter characters, mostly white spaces.

Tokens are indexed for further processing.

Normalization

* Case folding (convert all text to lower case)

* Spelling variations (have common spelling)

* Diacritics/Accent marks on letters (naïve to naive)

Randomness is reduced

Detecting common phrases

By indexing meaningful phrases

Effective retrieval by avoiding tokenizing phrases into bag-of-words

Building index

Add preprocessed terms to inverted index (it stores the list of documents in which the terms appear)

It is a lookup table to quickly find all documents containing a word.

 

 

 

Related links/questions



             

Keywords

Significance of preprocessing in information retrieval

Document preprocessing steps in information retrieval

General approach for text preprocessing

Text preprocessing in NLP

All you need to know about text preprocessing in information retrieval


Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents