Advanced Database Management System - Tutorials and Notes: What is document triage

Sunday, 29 March 2020

What is document triage

What is document triage, Define document triage, Purpose of document triage in natural language processing, what are the steps in document triage, why document triage is important



Document Triage

Document triage is the process of converting a set of digital files into well-defined text documents. It is one of two stages of text pre-processing.
Document triage process may involve one or more of the following steps based on the origin of the files being processed;
Character encoding identification – For any document to be machine readable, the characters and numbers should be represented in a character encoding. Character encoding is to store text as binary data and we have different character encoding schemes (ASCII, Unicode, UTF). Character encoding identification step is to determine the character encoding used in a text file.
Language identification – A document may consist of texts in a single language or multiple languages. This step is to identify the language(s) used in the document.
Text sectioning - Identifies the actual content within a file while discarding undesirable elements, such as images, tables, headers, links, and HTML formatting.

**********



No comments:

Post a comment

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents