Natural Language Processing
Natural Language Processing

Natural Language Processing (NLP) Tutorial

1. Introduction to NLP

Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that enables computers to understand, interpret, and generate human language. It combines computational linguistics with machine learning and deep learning.

Applications of NLP

  • Sentiment Analysis (e.g., detecting positive/negative reviews)
  • Machine Translation (Google Translate)
  • Chatbots & Virtual Assistants (Alexa, Siri)
  • Speech Recognition (Speech-to-text systems)
  • Text Summarization (Extractive & Abstractive summarization)
  • Named Entity Recognition (NER)
  • Spam Detection (Email filtering)

2. NLP Basics

2.1. Tokenization

Tokenization is the process of breaking a text into individual words or sentences.

Example using NLTK (Python library)

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

text = "NLP is fascinating! It helps machines understand human language."
print(word_tokenize(text))  # Word Tokenization
print(sent_tokenize(text))  # Sentence Tokenization

2.2. Stopword Removal

Stopwords are common words (like “is”, “the”, “and”) that are usually removed in NLP tasks.

from nltk.corpus import stopwords

stop_words = set(stopwords.words("english"))
words = word_tokenize(text)
filtered_words = [word for word in words if word.lower() not in stop_words]

print(filtered_words)

2.3. Stemming and Lemmatization

These techniques reduce words to their root form.

Stemming (using PorterStemmer)

from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
print(stemmer.stem("running"))  # Output: run
print(stemmer.stem("flies"))    # Output: fli




Lemmatization (using WordNetLemmatizer)

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("running", pos="v"))  # Output: run
print(lemmatizer.lemmatize("flies", pos="n"))    # Output: fly





3. Text Representation Techniques

image credits to freepik

3.1. Bag of Words (BoW)

BoW represents text as a vector of word frequencies.

from sklearn.feature_extraction.text import CountVectorizer

corpus = ["NLP is fun", "I love learning NLP"]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)

print(vectorizer.get_feature_names_out())  # Features
print(X.toarray())  # Word frequency matrix




3.2. Term Frequency-Inverse Document Frequency (TF-IDF)

TF-IDF assigns weight to words based on their frequency and importance.

from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer()
X = tfidf.fit_transform(corpus)

print(tfidf.get_feature_names_out())  # Features
print(X.toarray())  # TF-IDF values

3.3. Word Embeddings (Word2Vec, GloVe, FastText)

Word embeddings capture semantic meaning.

Word2Vec using Gensim

from gensim.models import Word2Vec

sentences = [["NLP", "is", "fun"], ["I", "love", "learning", "NLP"]]
model = Word2Vec(sentences, vector_size=10, window=2, min_count=1, workers=4)

print(model.wv["NLP"])  # Vector representation of "NLP"


4. NLP Tasks

4.1. Named Entity Recognition (NER)

Identifying entities like names, places, and dates in text.

import spacy

nlp = spacy.load("en_core_web_sm")
text = "Barack Obama was the 44th President of the United States."
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_)




4.2. Sentiment Analysis

Detecting whether text expresses positive or negative sentiment.

from textblob import TextBlob

text = "I love NLP, it's amazing!"
sentiment = TextBlob(text).sentiment
print(sentiment.polarity)  # Positive if >0, Negative if <0

4.3. Text Summarization

Extracting key information from large texts.

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

text = "NLP is a field of AI that deals with human language. It includes tasks like tokenization, NER, and sentiment analysis."
parser = PlaintextParser.from_string(text, Tokenizer("english"))
summarizer = LsaSummarizer()
summary = summarizer(parser.document, 2)

for sentence in summary:
    print(sentence)

4.4. Chatbot Development (Simple Example)

Creating a basic chatbot using ChatterBot.

from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer

chatbot = ChatBot("NLP_Bot")
trainer = ChatterBotCorpusTrainer(chatbot)
trainer.train("chatterbot.corpus.english")

response = chatbot.get_response("Hello")
print(response)


5. Advanced NLP with Deep Learning

image credits to freepik

5.1. Recurrent Neural Networks (RNN) & LSTMs

Used for sequence modeling in NLP.

5.2. Transformer Models (BERT, GPT)

  • BERT (Bidirectional Encoder Representations from Transformers)
    • Used for sentence understanding, question-answering.
  • GPT (Generative Pre-trained Transformer)
    • Used for text generation (e.g., ChatGPT).

Using Hugging Face Transformers

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I love NLP!")
print(result)





6. NLP Tools and Libraries

  • NLTK – Basic NLP tasks (Tokenization, POS tagging, etc.)
  • spaCy – Fast NLP processing (NER, Dependency Parsing)
  • Gensim – Topic modeling, Word2Vec
  • Hugging Face Transformers – Pre-trained models like BERT, GPT
  • TextBlob – Simple NLP operations (Sentiment Analysis, POS tagging)

7. Conclusion & Next Steps

  • Practice NLP by working on real-world datasets (e.g., movie reviews, Twitter sentiment analysis).
  • Experiment with pre-trained models (e.g., BERT, GPT-3).
  • Build custom NLP applications (chatbots, search engines).
  • Explore NLP competitions on Kaggle.

Would you like a hands-on project tutorial, such as building a sentiment analysis model or chatbot? 🚀

Show 1 Comment

1 Comment

  1. Himanshu

    Awesome content. Thanks for sharing such details.

Leave a Reply

Your email address will not be published. Required fields are marked *