Natural Language Processing

Natural Language Processing, or NLP, is used to analyse, visualise, and predict natural language. That is, languages that developed naturally, such as isiZulu, English or Spanish (but not Java or C#).

Videos and Readings

Watch the following videos as an introduction to Natural Language Processing in Python: - Introduction to NLP - Data cleaning and text-preprocessing in Python - Exploratory data analysis and word clouds in Python - Sentiment analysis with TextBlob in Python

The code from the videos can be found here.

The videos are a great introduction to the basic NLP analysis pipeline. They go through how to do NLP with the packages NLTK and TextBlob. However, we will be using spaCy as that is most often used in industry. For documentation on spaCy commands, see spaCy’s website and RealPython.

Terms to know

Natural Language Processing has its own set of terms that you should know to be able to talk about it. At the end of this topic, you should know what the following terms mean:

  • Tokenization
  • Corpus
  • Document-Term Matrix
  • Stop words
  • Bag-of-words
  • Lemmatization
  • Bi-grams
  • Word cloud
  • Named Entity Recognition

RAW CONTENT URL