Please visit, subscribe and share 10 Minutes Lectures in Computer Science

# What is smoothing in the context of natural language processing, define smoothing in NLP, what is the purpose of smoothing in nlp, is smoothing an important task in language model

### Smoothing in NLP

Smoothing is the process of flattening a probability distribution implied by a language model so that all reasonable word sequences can occur with some probability. This often involves broadening the distribution by redistributing weight from high probability regions to zero probability regions.

Smoothing not only prevents zero probabilities, attempts to improves the accuracy of the model as a whole.

### Why do we need smoothing?

In a language model, we use parameter estimation (MLE) on training data. We can’t actually evaluate our MLE models on unseen test data because both are likely to contain words/n-grams that these models assign zero probability to. Relative frequency estimation assigns all probability mass to events in the training corpus. But we need to reserve some probability mass to events that don’t occur (unseen events) in the training data.

Example:

Training data: The cow is an animal.

Test data: The dog is an animal.

If we use unigram model to train;

P(the) = count(the)/(Total number of words in training set) = 1/5.

Likewise, P(cow) = P(is) = P(an) = P(animal) = 1/5

To evaluate (test) the unigram model;

P(the cow is an animal) = P(the) * P(cow) * P(is) * P(an) * P(animal) = 0.00032

While we use unigram model on the test data, it becomes zero because P(dog) = 0. The term ‘dog’ never occurred in the training data. Hence, we use smoothing.

****************