Showing posts with label Information Retrieval. Show all posts
Showing posts with label Information Retrieval. Show all posts

Sunday, June 27, 2021

What are the possible problems with Boolean retrieval models

List down the problems with Boolean retrieval model, What are the notable drawbacks of Boolean retrieval model, Boolean model is one of the information retrieval model, why the result from Boolean model is not ranked

Question:

What are the possible problems with Boolean retrieval models?

 

Answer:

  • Very rigid: AND means all; OR means any.
    • Queries are Boolean expressions of keywords connected by AND, OR, and NOT, including the use of brackets.
  • Difficult to express complex user requests.
    • We cannot express complex queries as only keywords are processed for retrieval purpose.
  • Difficult to control the number of documents retrieved. 
    • All matched documents will be returned.
  • Difficult to rank output. 
    • All matched documents logically satisfy the query.
  • Difficult to perform relevance feedback. 
    • A document is identified by the user as relevant or irrelevant. No further data are collected. It makes it hard to perform relevance feedback.

 


Related links/questions

             

Keywords

Drawbacks of boolean retrieval model

Why the result of a boolean model is not ranked?

Why is it hard to represent complex queries in boolean model?

Can we use relevance feedback in boolean information retrieval model?

 

Saturday, June 26, 2021

What are the elements a benchmark dataset should have to measure the relevance of search results

List of elements a benchmark dataset should have in information retrieval task, what we need for a benchmark dataset, what do we need to measure the retrieval effectiveness of a search system, standard benchmark collection for information retrieval evaluation

Question:

What are the elements a benchmark dataset should have to measure the relevance of search results?

 

Answer:

The retrieval effectiveness of a system is evaluated on a set of documents, queries, and relevance judgments. A benchmark dataset should have the following elements;

  • A document collection
    • Documents must be representative of the documents we expect to see in reality
  • A set of queries
    • It refers to a collection of information needs. The set of queries must also be representative of the information that we need in reality.
  • An assessment by human judges on the relevancy of documents for different information needs.
    • We need to involve humans to judge whether a document is relevant or not for a query. It is usually a costly process.

Some standard benchmark collections include Cranfield, TREC (Text Retrieval Conference), and CLEF (Cross Language Evaluation forum).

 


Related links/questions

             

Keywords

List few benchmark data collection for information retrieval evaluation.

Information retrieval evaluation methods

How to measure the retrieval effectiveness of a  retrieval system

Monday, June 21, 2021

What are the problems with Jaccard similarity coefficient

List the issues with Jaccard similarity coefficient, What are the problems with Jaccard index, Define Jaccard index, Can we use Jaccard index to find similarity between two documents 

 

Question:

List down the issues/problems with Jaccard similarity.

 

Answer:

The Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It is calculated as follows;

Jaccard (A, B) = |A ∩ B|/|A U B|

The number of common members in two sets divided by the total number of members in both the two sets is the Jaccard coefficient. It can be a value between 0 and 1 where 0 indicates no overlap and 1 indicates perfect overlap.

Problems with Jaccard

  • It doesn’t consider term frequency (how many times a term occurs in a document). It simply counts the number of terms that are common between two sets.
  • Rare terms in a collection are more informative than frequent terms. Jaccard doesn’t consider this information.
  • Different sized sets with same number of common members also will result in the same Jaccard similarity.

 

 

Related links/questions

             

Keywords

List down the issues with Jaccad similarity coefficient

What are the disadvantages of Jaccard similarity index

What Jaccard index value gives perfect overlap?

Can we use Jaccard similarity to measures the closeness between two text documents? 

 

Saturday, June 19, 2021

Multiple Choice Questions with Answers in Information Retrieval SET 1

Top 5 quiz questions in IR, Information retrieval quiz, information retrieval mcqs with answers, information retrieval, classic retrieval models, inverted index, precision, recall, query expansion, relevance feedback questions

Information Retrieval MCQs - SET 01

1. Which of the following is/are classical IR models?

a) Vector model

b) Boolean model

c) Interaction model

d) Cluster model

Click here to view answer and explanation


 

2. What is the disadvantage of Boolean retrieval model?

a) Easy to implement

b) Difficult to rank output

c) Difficult to process a query

d) It is one of the complex retrieval models

Click here to view answer and explanation


 

3. An inverted index is a database index that ____.

a) stores, for each term t, the list of all documents that contain term t

b) stores mapping from documents to words

c) orders the terms in a different order which is not a sequential order.

d) All of the above

Click here to view answer and explanation


 

4. For a query, a retrieval system retrieves 42 relevant documents and 34 irrelevant documents from a document collection that consists of 95 relevant documents. What is the precision of the retrieval system?

a) 0.40

b) 0.50

c) 0.62

d) 0.55

Click here to view answer and explanation


 

5. Which of the following techniques tries to improve recall of an information retrieval system by adding synonyms to the query?

a) Query expansion

b) Relevance feedback

c) Pseudo relevance feedback

d) None of the above

Click here to view answer and explanation


 

********************

 

Related links

 

Keywords

Information retrieval quiz questions with answers

How to improve recall of an IR system

What are the classical IR models

How to calculate precision of a retrieval system

Define the term inverted index

List the disadvantage of boolean retrieval model

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery