Natural Language Processing (NLP) Tutorial

1. Introduction to NLP

Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that enables computers to understand, interpret, and generate human language. It combines computational linguistics with machine learning and deep learning.

Applications of NLP

Sentiment Analysis (e.g., detecting positive/negative reviews)
Machine Translation (Google Translate)
Chatbots & Virtual Assistants (Alexa, Siri)
Speech Recognition (Speech-to-text systems)
Text Summarization (Extractive & Abstractive summarization)
Named Entity Recognition (NER)
Spam Detection (Email filtering)

2. NLP Basics

2.1. Tokenization

Tokenization is the process of breaking a text into individual words or sentences.

Example using NLTK (Python library)

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

text = "NLP is fascinating! It helps machines understand human language."
print(word_tokenize(text))  # Word Tokenization
print(sent_tokenize(text))  # Sentence Tokenization

2.2. Stopword Removal

Stopwords are common words (like “is”, “the”, “and”) that are usually removed in NLP tasks.

from nltk.corpus import stopwords

stop_words = set(stopwords.words("english"))
words = word_tokenize(text)
filtered_words = [word for word in words if word.lower() not in stop_words]

print(filtered_words)

2.3. Stemming and Lemmatization

These techniques reduce words to their root form.

Stemming (using PorterStemmer)

from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
print(stemmer.stem("running"))  # Output: run
print(stemmer.stem("flies"))    # Output: fli

Lemmatization (using WordNetLemmatizer)

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("running", pos="v"))  # Output: run
print(lemmatizer.lemmatize("flies", pos="n"))    # Output: fly

3. Text Representation Techniques

Natural Language Processing (NLP) Tutorial 2

image credits to freepik

3.1. Bag of Words (BoW)

BoW represents text as a vector of word frequencies.

from sklearn.feature_extraction.text import CountVectorizer

corpus = ["NLP is fun", "I love learning NLP"]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)

print(vectorizer.get_feature_names_out())  # Features
print(X.toarray())  # Word frequency matrix

3.2. Term Frequency-Inverse Document Frequency (TF-IDF)

TF-IDF assigns weight to words based on their frequency and importance.

from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer()
X = tfidf.fit_transform(corpus)

print(tfidf.get_feature_names_out())  # Features
print(X.toarray())  # TF-IDF values

3.3. Word Embeddings (Word2Vec, GloVe, FastText)

Word embeddings capture semantic meaning.

Word2Vec using Gensim

from gensim.models import Word2Vec

sentences = [["NLP", "is", "fun"], ["I", "love", "learning", "NLP"]]
model = Word2Vec(sentences, vector_size=10, window=2, min_count=1, workers=4)

print(model.wv["NLP"])  # Vector representation of "NLP"

4. NLP Tasks

4.1. Named Entity Recognition (NER)

Identifying entities like names, places, and dates in text.

import spacy

nlp = spacy.load("en_core_web_sm")
text = "Barack Obama was the 44th President of the United States."
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_)

4.2. Sentiment Analysis

Detecting whether text expresses positive or negative sentiment.

from textblob import TextBlob

text = "I love NLP, it's amazing!"
sentiment = TextBlob(text).sentiment
print(sentiment.polarity)  # Positive if >0, Negative if <0

4.3. Text Summarization

Extracting key information from large texts.

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

text = "NLP is a field of AI that deals with human language. It includes tasks like tokenization, NER, and sentiment analysis."
parser = PlaintextParser.from_string(text, Tokenizer("english"))
summarizer = LsaSummarizer()
summary = summarizer(parser.document, 2)

for sentence in summary:
    print(sentence)

4.4. Chatbot Development (Simple Example)

Creating a basic chatbot using ChatterBot.

from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer

chatbot = ChatBot("NLP_Bot")
trainer = ChatterBotCorpusTrainer(chatbot)
trainer.train("chatterbot.corpus.english")

response = chatbot.get_response("Hello")
print(response)

5. Advanced NLP with Deep Learning

Natural Language Processing (NLP) Tutorial 3