Daniel J. Dorado

Computational Linguist

Natural Language Processing

» project

Articles by category: project


R phonology python normalization text nlp regex internationalization localization gettext translation tokenization
2020
09 Jan 2020

Let's Make a Tokenizer Part 4

Congratulations on making it to the last post in making a tokenizer. Let’s jump right in and bring in our...

06 Jan 2020

Let's Make a Tokenizer Part 3

Grammar So as mentioned in the first post, the grammar is where rules are specified. These grammatical rules will check...

2019
29 Dec 2019

Let's Make a Tokenizer Part 2

Exceptions So let’s get started making the exceptions.py file. As you see this, is pretty basic. It’s just a Python...

28 Dec 2019

Let's Make a Tokenizer Part 1

Introduction So in this post, we’ll start making a simple English tokenizer in pure Python. Simple in the sense that...