Wednesday, May 20, 2020

Natural language processing question bank 04

You are building an ngram model of a corpus. Should you stem the words and do the counts or leave them in the surface form? Give pros and cons and include what characteristics of the corpus might influence your decision.


Question:

You are building an ngram model of a corpus. Should you stem the words and do the counts or leave them in the surface form? Give pros and cons and include what characteristics of the corpus might influence your decision.


Answer:


  • Stemming the words means there will be fewer types, since there will just be base forms. This means that some generalizations will be captured (He swam, he swims … He swim). However, there are some generalization that won’t be captured (I swim vs. she swims),
  • This is a good idea when there is a small amount of data and there are fewer examples of the ngrams or in highly inflected languages where there are many different forms of each word.
  • If a large amount of data available, however, ngrams over the surface forms can be more powerful and precise.

*************************

Related questions:


  • You are building an ngram model of a corpus. Should you stem the words and do the counts or leave them in the surface form? Give pros and cons and include what characteristics of the corpus might influence your decision.

No comments:

Post a Comment

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery