Friday, April 3, 2020

Context Free Grammar (CFG) for NLP

Context Free Grammar (CFG) for NLP, formal definition of context-free grammar, CFG examples in NLP

Context Free Grammar (CFG)

A context-free grammar consists of a set of rules or productions, each of which expresses the ways that symbols of the language can be grouped and ordered together, and a lexicon of words and symbols.
[Source - Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin]
A context-free grammar (CFG) is a list of rules that define the set of all well-formed sentences in a language. Each rule has a left-hand side, which identifies a syntactic category, and a right-hand side, which defines its alternative component parts, reading from left to right.

Context Free Grammar (CFG) - Formal Definition

Context-free grammar G is a 4-tuple.
G = (V, T, S, P)
 These parameters are as follows;

  • VSet of variables (also called as Non-terminal symbols)

  • TSet of terminal symbols (lexicon)

    • The symbols that refer to words in a language are called terminal symbols.

    • Lexicon is a set of rules that introduce these symbols.

  • SDesignated start symbol (one of the non-terminals, S V)

  • PSet of productions (also called as rules).

    • Each rule in P is of the form A → s, where

      • A is a non-terminal (variable) symbol.

      • Each rule can have only one non-terminal symbol on the left hand side of the rule.

      • s is a sequence of terminals and non-terminals. It is from (T U V)*, infinite set of strings.

  • A grammar G generates a language L.

Example context-free grammar
G = (V, T, S, P)
   V = {S, NP, VP, PP, Det, Noun, Verb, Aux, Pre}
   T = {‘a’, ‘ate’, ‘cake’, ‘child’, ‘fork’, ‘the’, ‘with’}
   S = S
   P = { S → NP VP
   NP → Det Noun | NP PP
   PP → Pre NP
   VP → Verb NP
   Det → ‘a’ | ‘the’
   Noun → ‘cake’ | ‘child’ | ‘fork’
   Pre → ‘with’
   Verb → ‘ate’}
Some notes:

  • Note 1: In P, pipe symbol (|) is used to combine productions into single representation for productions that have same LHS. For example, Det → ‘a’ | ‘the’ derived from two rules Det → ‘a’ and Det → ‘the’. Yet it denotes two rules not one.

  • Note 2: The production highlighted in red are referred as grammar, and green are referred as lexicon.

  • Note 3: NP – Noun Phrase, VP – Verb Phrase, PP – Prepositional Phrase, Det – Determiner, Aux – Auxiliary verb

Sample derivation:
   → Det Noun VP
   → the Noun VP
   → the child VP
   → the child Verb NP
   → the child ate NP
   → the child ate Det Noun
   → the child ate a Noun
   → the child ate a cake

From this derivation, we can understand that the sentence ‘the child ate a cake’ is a valid and accepted sentence by the grammar G. in other words, the sentence is part of the language produced by G.


Define context free grammar

How to derive sentences from cfg?

what is lexicon in CFG

What are terminal and non-terminal symbols in CFG

CFG sample solved problem

1 comment:

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery