Top
 5 quiz questions in IR, Information retrieval quiz, information 
retrieval mcqs with answers, information retrieval,  inverted index, zipf's law, fallout measure, term frequency, formal definition of information retrieval system
Information Retrieval MCQs - SET 03
1. A data structure that maps terms
back to the parts of a document in which they appear is called 
a) Lexicon
b) Dictionary 
c) Inverted index
d) All of the above
Click here to view answer and explanation
Ans : (c)
 
  | Answer: (c)
  Inverted index An inverted index
  (also referred to as a postings file or inverted file) is a database index
  storing a mapping from content, such as words or numbers, to its locations in
  a table, or in a document or a set of documents (named in contrast to a
  forward index, which maps from documents to content). The purpose of an
  inverted index is to allow fast full-text searches, at a cost of increased
  processing when a document is added to the database. Refer here for more.
  [Source: Wikipedia] 
 | 
   
   
 
2. How the information retrieval
problem can be defined formally?
a) a triple
b) a quadruple
c) a couple
d) None of the above
Click here to view answer and explanation
Ans : (b)
 
  | Answer: (b)
  a quadruple (4-tuple) IR model can be
  defined as 4-tuple [D, Q, F, R(q,j)] where D refers to the collection of
  documents, Q refers to the query collection, F refers to the framework for
  modeling documents and queries, and R refers to the ranking function to
  associate a rank to the query and the document.   | 
 
   
 
 
3. The count of occurrences of a word
in a document is referred as
a) document frequency
b) term frequency
c) collection frequency
d) change frequency
Click here to view answer and explanation
Ans : (b)
 
  | Answer: (b)
  term frequency  How many times a
  term occurs in a document is called the term frequency (TF). It is the count
  of occurrence of a term t in a document d. For example, in
  this answer box (the above paragraph), the term frequency of “occurrence” is
  1, “document” is 2.   | 
  
 
4. Suppose the frequency of the most
frequent word in a corpus of Tamil documents is 10000. What would be the
estimated frequency of second most frequent in the given corpus as per Zipf’s
law?
a) 10000
b) 2500
c) 5000
d) Cannot be determined
Click here to view answer and explanation
Ans : (c)
 
  | Answer: (c)
  5000 Frequency of second most frequent word = frequency of
  most frequent word / 2                                                                           = 10000/2 = 5000 As per Zipf’s
  law, the frequency of certain words is inversely proportional to their rank.
  In simple terms, a word of rank r occurs 1/r times as the most frequent
  words. That is, the rank 2 word occurs ½ times as the most frequent word, the
  rank 3 word occurs 1/3 times as the most frequent word and so on.   | 
  
 
5. The proportion of non-relevant
items that has been retrieved in a given search is 
a) Precision
b) Recall
c) Generality
d) Fallout
Click here to view answer and explanation
Ans : (d)
 
  | Answer: (d)
  Fallout Fallout ratio
  refers to the proportion of non-relevant documents that are retrieved. It is
  used to measure how well the IR system filters out non-relevant documents.  For a good
  information retrieval system, the fallout ratio should be low.If N is the total
  number of documents in the collection, y is the number of non-relevant
  documents retrieved, and x is the number of relevant documents, then fallout
  ratio F can be calculated as follows; F = y / (N - x) | 
  
 
********************
 
 Related links
 
Keywords
For what values of fallout ratio, we would say that the IR system is good?
Formal definition of information retrieval system as a quadruple
How to find the frequency of a second most frequent word using Zipf's law?
What is an inverted index? how to construct inverted index?