Showing posts with label Information Retrieval. Show all posts
Showing posts with label Information Retrieval. Show all posts

Saturday, July 3, 2021

Common preprocessing steps and their significance in information retrieval

common preprocessing steps used in information retrieval task, Significance of preprocessing in information retrieval, All you need to know about text preprocessing in information retrieval

Question:

What are the common preprocessing steps used in information retrieval task?

 

Answer:

 

Preprocessing technique

How?

Benefits

Extract root words

* Stemming (Rule-based, dictionary based, corpus based)

* Lemmatization

1. Improves recall

2. Indexing size reduced

Stop words removal

Stop word list can be used

1. Improves efficiency of retrieval

2. Indexing size reduced

Tokenization (break sentences into tokens/keywords)

Typical solution is to split a sentence at non-letter characters, mostly white spaces.

Tokens are indexed for further processing.

Normalization

* Case folding (convert all text to lower case)

* Spelling variations (have common spelling)

* Diacritics/Accent marks on letters (naïve to naive)

Randomness is reduced

Detecting common phrases

By indexing meaningful phrases

Effective retrieval by avoiding tokenizing phrases into bag-of-words

Building index

Add preprocessed terms to inverted index (it stores the list of documents in which the terms appear)

It is a lookup table to quickly find all documents containing a word.

 

 

 

Related links/questions



             

Keywords

Significance of preprocessing in information retrieval

Document preprocessing steps in information retrieval

General approach for text preprocessing

Text preprocessing in NLP

All you need to know about text preprocessing in information retrieval


Monday, June 28, 2021

What are the major problems with relevance feedback in information retrieval

List down the drawbacks of relevance feedback query expansion, Why relevance feedback is not widely applied?, basic problems with relevance feedback, When to use relevance feedback?

Question:

Why is relevance feedback not widely used? / What are the major problems with relevance feedback?

 

Answer:

  • Users were sometimes reluctant to provide explicit feedback.
    • They may not be interested in interacting with the search engine. Rather, they may go for trying different query.
  • The user’s knowledge is insufficient to formulate the initial query.
    • User should have sufficient knowledge to represent their information needs through initial query. The cases like misspellings and vocabulary mismatch cannot be handled by relevance feedback alone.
  • Relevance feedback is expensive. It results in long queries that require more computation to retrieve. Search engines process lots of queries and allow little time for each one.
  • Makes it harder to understand why a particular document was retrieved after applying relevance feedback.

 


Related links/questions


             

Keywords

When to use relevance feedback query expansion technique?

When not to use relevance feedback?

List the basis problems with relevance feedback

Query expansion using relevance feedback

How to improve search results using relevance feedback

 

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery