Daniel J. Dorado

Computational Linguist

NLP Notebook

09 Jan 2020

Let's Make a Tokenizer Part 4

Congratulations on making it to the last post in making a tokenizer. Let’s jump right in and bring in our...

06 Jan 2020

Let's Make a Tokenizer Part 3

Grammar So as mentioned in the first post, the grammar is where rules are specified. These grammatical rules will check...

29 Dec 2019

Let's Make a Tokenizer Part 2

Exceptions So let’s get started making the exceptions.py file. As you see this, is pretty basic. It’s just a Python...

28 Dec 2019

Let's Make a Tokenizer Part 1

Introduction So in this post, we’ll start making a simple English tokenizer in pure Python. Simple in the sense that...

21 Dec 2019

Tokenization

Introduction Tokenization is an import step in the NLP pipeline. It is often part of the text normalization process. Many...

25 Nov 2019

Internationalization & Localization in Python

Internationalization & Localization in Python In this post, we’ll look briefly into Internationalization and Localization in Python using the GNU...

13 Nov 2019

Normalizing Text with Regex Groups in Python

Normalizing Text with Regex Groups in Python In this post we’re going to look at how regex groups can help...

26 Oct 2019

Creating Vowel Plots in R

Plotting Vowel Formants In this tutorial, I’ll demonstrate how to create a basic vowel plot using the ggplot2 library in...