Top
5 quiz questions in IR, Information retrieval quiz, information
retrieval mcqs with answers, information retrieval, stop word removal, inverted index, heaps' law, precision, recall, query expansion, inverse document frequency
Information Retrieval MCQs - SET 02
1. Which of the following is the local method for improving recall of an
information retrieval system?
a) Query expansion
b) Relevance feedback
c) Ontology based model
d) None of the above
Click here to view answer and explanation
Ans : (b)
Answer: (b)
Relevance feedback
Local methods
adjust a query relative to the documents that initially appear to match the
query.
Relevance
feedback does a local on-demand analysis on initial query results to improve
the recall of the IR system. Initial results can be refined based on the
selection of relevant and non-relevant documents by the user from the initial
retrieval results.
|
2. The process of removing most common
words (and, or, the, etc.) by an information retrieval system before indexing
is known as
a) Lemmatization
b) Stop word removal
c) Inverted indexing
d) Normalization
Click here to view answer and explanation
Ans : (b)
Answer: (b)
Stop word removal
Stop words are
the most common words in a language (articles, prepositions, conjunctions,
etc.). These common words are of little value in helping select documents
matching a user need. Stop word removal helps in reduced dataset size and
improved system performance.
Stop word removal
is done using a predefined stop word list.
Stop word
elimination used to be standard in older IR systems. But you need stop words
for phrase queries, e.g. “Queen of England”.
|
3. An inverted index arranges data in
a sorted order as per
a) the documents
b) the frequency of each document
c) the frequency of each term
d) the terms
Click here to view answer and explanation
Ans : (d)
Answer: (d)
the terms
An inverted index
is the sorted list (or index) of keywords (attributes), with each keyword
having links to the documents containing that keyword.
Inverted index is
a word-oriented mechanism for indexing a text collection to speed up the
searching task. The inverted index structure is composed of two elements: the
vocabulary (term) and the occurrences. The vocabulary is the set of all
different words in the text. For each word in the vocabulary the index stores
the documents which contain that word (inverted index).
|
4. The vocabulary size (unique words)
of a text can be estimated using
a) Zipf’s law
b) Scientific law
c) Heaps’ law
d) Inverted index rule
Click here to view answer and explanation
Ans : (c)
Answer: (c)
Heaps’ law
Heaps’ law approximates the number of unique words
in a text of n words.
The law can be
described as the number of words in a document increases, the rate of the
count of distinct words available in the document slows down.
The documented
definition of Heaps’ law (also
called Herdan's law) says that
the number of unique words in a text of n words is approximated by
V(n) = K n^β
where K is a
positive constant and β is between 0 and 1. K is often upto 100 and β is
often between 0.4 and 0.6.
|
5. A metric used measure the
importance of a term in a text document collection is called
a) Inverse Document Frequency
b) Term Frequency
c) Inverse Term Frequency
d) Document Frequency
Click here to view answer and explanation
Ans : (a)
Answer: (a)
Inverse Document Frequency
Inverse Document
Frequency (IDF) is a metric used to measure the importance of a term in a
document collection. It is calculated as follows;
idft =
log (N/dft)
idf weight
indicates the importance of a term based on how common a word in the
collection. idf weight for most common words will be lower and rare words
will be higher.
IDF affects the
ranking of documents for queries with at least two terms.
|
********************
Related links
Keywords
Which metric is used to measure the importance of a term in a collection in IR?
What does Heaps' law do?
How are data in inverted index arranged?
Why do we remove stop words? Importance of removing stop words. Contribution of stop word removal in information retrieval.
How local methods are helpful in improving recall of a retrieval system?