Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Handling Unstructured Data

Using executable BPMN diagrams and a workflow engine heading towards Intelligent Automation.

12 min readFeb 1, 2022

--

Press enter or click to view image in full size
Photo by Campaign Creators on Unsplash

At this point, I believe it is fair to say that we have a good handle on the processing of structured data. As an industry, we have plenty of tools to process, store and analyze rows and columns of data. We have even more tools to paint visuals, create dashboards, and produce reports. Much is written about performing exploratory data analysis (EDA), Machine Learning, Statistics, and SQL topics every day. There are even more courses and books developed and released each month. But what about documents not in neat rows and columns, such as PDF or web pages? These are also structured, but they do not have rows & columns as a consistent structure throughout. The term unstructured is often used loosely in these contexts. I believe that there is really no such concept as unstructured data. Everything is organized in patterns; otherwise, we as humans won’t understand anything.

A lot of information is produced in documents every day, published, posted, and shared. I get a tonne of emails every day. So I got to thinking about creating a context where all those ‘unstructured’ documents might turn into information, be processed, read, parsed, and organized into ‘rows & columns’ in a seamless way. The more I thought about it, the…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

David Moore
David Moore

Written by David Moore

x PwC xMotorola Finance, xIBM Finance, xDeloitte Tax Management Consulting — Digital Transformation, ESG, Analytics