How to remove punctuations in NLTK

Natural Language Processing (NLP) involves processing and analyzing human language using machines. Removing punctuation is an important text preprocessing step that helps clean text data and improve the performance of NLP models and text analysis tasks.

Implementation

Step 1: Install NLTK

To install NLTK run the following command in your command prompt

pip install nltk

Step 2: Import Required Libraries

Imports NLTK and the tokenizer used for splitting text into words.

Python

import nltk
from nltk.tokenize import word_tokenize

Step 3: Download Tokenizer Resources

Downloads the tokenizer resources required for word tokenization.

Python

nltk.download('punkt')
nltk.download('punkt_tab')

Step 4: Define Input Text

Creates a sample sentence containing punctuation marks.

Python

text = "Hello! Welcome to NLP, using NLTK."

Step 5: Tokenize and Remove Punctuation

Tokenizes the text and removes punctuation using isalnum() to keep only words and numbers.

Python

tokens = word_tokenize(text)

clean_text = [
    word for word in tokens
    if word.isalnum()
]

print(' '.join(clean_text))

Output:

Hello Welcome to NLP using NLTK

Download full code from here

How to remove punctuations in NLTK

Implementation

Step 1: Install NLTK

Step 2: Import Required Libraries

Step 3: Download Tokenizer Resources

Step 4: Define Input Text

Step 5: Tokenize and Remove Punctuation

Explore