Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Summarize a Text with Python — Continued

How to efficiently summarize a text with Python and NLTK and as a bonus the detection of the language of a text

5 min readNov 16, 2022

--

Press enter or click to view image in full size
Photo by Mel Poole on Unsplash

In the article ‘Summarize a text with Python’ of last month I showed how to create a summary for a given text. Since then, I have been using this code frequently and found some flaws in the usage of this code. The summarize method is replaces with a class performing this function, e.g. making it easier to use the same language and summary length. The previous article was very popular so I would love to share the updates with you!

Improvements made are:

  • Introduced a Summarizer class, storing general data in attributes
  • Use the NLTK corpus builtin stop word lists, keeping the possibility to use your own list
  • Auto detect the language of a text to load the stop word list for this language
  • Call the summary function with a string or a list of strings
  • Optional sentence weighting on length
  • Incorporated the summary method for text files

The result can be found on my Github . Feel free to use it or adapt to your own wishes.

The basics of the Summarizer class

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Leo van der Meulen
Leo van der Meulen

Written by Leo van der Meulen

Dutch open data and public transportation enthousiast. Working for over 15 years in public transport. LinkedIn: https://www.linkedin.com/in/leovandermeulen/