Natural Language Processing Pipeline

Natural language processing can have many or generalized step outside this article. But these step is represent the most common framework when we want to analyze document or grup of text and implement NLP from it. The process are :

1. Sentence Segmentation

2. Tokenization

3. Part of Speech Tagging

4. Lemmatizationd

5. Stop Word Elimination

6. Dependency Parsing

Dependency parsing the activity to figure it out how the word in a sentence relate each other.The goal is making the some kind of tree that has a single parent word of the sentence. The parent word will be the root of the sentence tree and became the main verb of the sentence.

7. Finding Noun Phrases

8. Named Entity Recognition (NER)

Recognition process of person entity, location entity, Geographical entity, date entity and so on. The NER of a pre set dataset, can be to plain, and can not even recognize name of person for example. In that case, we make the training of the person name, so it is included in NER, then when get the data, the NER model can detect / identify accurately. The NER is important to get summary of long text, it can make the process faster rather than analyze by human.