Data Democratisation: 5 'Data For All' Strategies Embraced by Large Companies

In 2006, the Harvard Business Review published an article titled "Competing on Analytics".

This influential piece by academics Thomas Davenport and Jeanne Harris sparked widespread discussion on the idea of leveraging analytics as a competitive business advantage.

Companies began investing in BI software, big data platforms, data science teams, and cutting-edge tools for AI and machine learning in the hopes of becoming data-driven firms.

The results were underwhelming.

A Deloitte survey of American executives fourteen years later found that only 1 in 10 companies competed on analytical insights. Most firms could only lay claim to isolated silos of analytics excellence. And that the most popular tool for analytics was, drumroll…

…Microsoft Excel.

The truth is transforming into a data-driven organisation is way harder than it looks.

Follow my analytics and data YouTube channel here.

Being able to harness data-driven insights at scale and integrate them into every day decision-making requires a high level of enterprise data maturity across multiple realms:

Data: If you don’t have good data, AI is over.
Skills: Is your workforce as a whole data literate?
Tools: Is your infrastructure set up for analytics at scale?
Culture: This is the biggest impediment. Does your firm have a legacy culture resistant to data-driven insights? It’s a show-stopper.

My company, a ‘Big Four’ bank where I’ve worked as an engineer and data scientist for the past five years, is sitting at 2.5 out of 5 on the data maturity scale. We’re working hard to get to data-driven 4, putting us at the cusp of the industry-leading ‘digital native’ companies. (Go team!)

The average firm globally sits at around 2.2, according to the International Institute of Advanced Analytics.

That means only a small minority of employees have analytics skills beyond a spreadsheet.

The solution appears clear.

To lay the stepping stones towards becoming data-driven, enterprises need to drive data maturity.

And how to drive data maturity?

Say hello to data democratisation, an approach being adopted by firms worldwide.

Data democratisation is a Data For All all-hands-on-deck ethos that aims to elevate data maturity across every corner of the company.

For instance, rather than propping up specialised tools that can only be leveraged by a small, privileged and hyper-specialised team of data scientists, my bank is investing to uplift the data, skills, tools and culture to empower all 40,000 of my colleagues to…

self-serve trustworthy data at will;
automate away the mundane in their 9–5;
embrace data-driven insights and decision-making over ‘hunches‘.

If everyone just saved an hour a week by, say, automating a simple process, my bank would save a combined 2 million hours a year, translating to roughly $150 million that could be spend elsewhere.

Therein lies an important lesson.

Data democratisation recognises the importance of the mundane alongside the moonshot analytics projects.

Cutting-edge AI & machine learning projects by our best and brightest data scientists (1% of the company) are to be celebrated.

Equally-deserving of celebration are the quick wins by our everyday knowledge workers and citizen analysts.

By properly creating a solid foundation for enterprise analytics and data science at scale, **** data democratisation promises to strike at the heart of the why most companies have so-far failed to become data-driven.

So, what does data democratisation actually look like in practice?

I’ll touch on five strategies, and draw on my own experiences for each.

The first three represent advancements in tooling, while the final two look at the strides being made in the realms of data, skills and culture.

1. All-in-One Analytics & Auto-ML Platforms

The 2020s has seen the rise of low-code one-stop-shop analytics & machine learning platforms that integrate the three main approaches to enterprise problem-solving under a single roof:

Visualisation;
Analytics;
Automation.

The most popular platforms include:

Alteryx, used by half of the world’s largest 2,000 companies;
Dataiku, a French unicorn backed by Google’s investment arm;
DataBricks, which unifies big data compute and ML under a single roof;
Snowflake, a zippy data warehouse disruptor hosted on the cloud.

Here’s what their UI looks like:

My bank currently uses Alteryx and Dataiku.

These enable our finance team to set up an audit pipeline in minutes. Simply drag-and-drop some data sources onto the screen, join them together with a few button clicks, then set up an automation to email you when a red flag shows up.

No spreadsheets, SQL or Python needed.

No jumping between different programs.

No laborious testing of automation pipelines to see if they behave properly.

Everything just works.

There’s additional synergy in having all that capability on an all-in-one platform.

Suppose my manager asked me to dive into customer or employee attrition – a classic business problem. In any of these modern platforms, I could…

Start off with some visualisations to try and spot interesting trends. Good. What if I don’t have much luck?
No worries, time for some analytics. I can prototype predictive models to expose the underlying drivers of attrition from your data. Turns out those who work more than 48 hours a week show a drastically heightened chance of quitting their jobs. Great insight, now we have a business decision – hire more people to spread the work or identify who’s at risk of quitting and stop it in time.
Very quickly I can set up an automation to email me (or HR) whenever the data reveals someone working too hard.

In enterprise problem-solving, you never know which approach – visualisations, analytics or automation – will work the best.

Here, automation is actually the most impactful, because it flags at-risk employees that enable immediate action, saving tons of money down the line. HR will tell you it costs a lot of money to replace a lost employee.

But indeed, getting to the destination took some follow-my-nose playing around first. Analytics is often not linear. Being able to do everything in one place and push data and results around – often without the need to code – makes life so much easier.

An incredibly empowering proposition for citizen analysts and data scientists alike.

2. Combined Productivity & Analytics Ecosystems

This strategy looks at a federated set of applications and platforms that are tightly and synergistically integrated under a single ecosystem.

The best example of this is Microsoft.

For enterprise productivity, the industry standard has been Windows and Office 365 for a generation:

Outlook for calendar and emails;
Teams and Yammer for chats and communities;
OneDrive & SharePoint for file storage;
Word and Excel for every day work.

These staples are now part of the broader Microsoft 365 ecosystem, ** reflecting Microsoft’s strategy to provide a unified productivity and analytics platfor**m for its customers.

The analytics part of the formula is Microsoft Power Platform, a family of low-code apps, comprising…

Power BI – create data models and stunning visualisations with no code.
Power Apps – craft desktop and mobile apps with little or no code.
Power Automate – automate workflows and tasks with little or no code.
Power Virtual Agent – build your own chatbot with little or no code.

A preview of their UI:

The ‘democratisation’ comes from two things.

First, these tools are easy to use, and getting easier by the month. Easy means adoption, as any business person, entrepreneur and UX designer will tell you.

Second, the integration of these Power Platform apps with each other and with Microsoft’s broader bread-and-butter productivity apps like Outlook and Teams, is crucial to empower every employee to create value for themselves and the organisation.

Take, for instance, the notorious (and mundane) hassle of reviewing and approving requests, usually dealt with via endless back and forth emails and manual processes. This is the number one ticket driver in most organisations.

Enough! Tired of wasting hours each reviewing, say data requests?

Citizen analysts who own data can build a Power BI report that provides information on what data assets they own and whether they can be shared with colleagues.

They can embed these data-driven insights straight into an app built in Power Apps that serves as a front-end for colleagues applying for access. (Moreover, this app is immediately in production because of Microsoft’s mature integration with large enterprise infrastructure – a sigh of relief for data governance officers!)

Now, when colleagues submit their data requests through the app, the data owner will be emailed an automated request in Outlook, or even directly pinged on Teams via. Microsoft’s Approvals app, because these automations were set up in Power Automate, which, again, required little or no coding.

Our humble citizen analyst just transformed a mundane task that hogged 5 hours of their time each week into an automated pipeline that now takes minutes, freeing up valuable time to do more productive work.

(Or maybe get off work early…!)

3. Mature Big Data Tools

The embrace of big data and data lakes by enterprises worldwide over the past five years have largely failed to live up to the hype.

Plagued by crude tools, data quality issues, inflated promises and scalability challenges, there was simply too large of a gap between the necessary means to unlock the latent value of all that data sitting in the lake and the realities faced by employees.

At my bank, our first Apache Zeppelin notebooks were difficult to use. Metadata on Apache Atlas was difficult to navigate for non-data engineers. Data was often ingested on a project-basis and frequently incomplete, requiring integration with our data warehouse assets — a less than ideal situation for data scientists who prefer to wrangle and model with unprocessed data.

The lack of usage of many data assets meant data quality issues piled up.

The water in the data lake grew stale.

We later hooked up better tools like Power BI and managed machine-learning platforms like Dataiku into our data lake infrastructure, but quirks and inefficiencies persisted, akin to the challenges encountered by PC manufacturers who must piece together hardware and software from a slew of different vendors.

If only there was a big data solution that offered an "Apple-like" experience: Hardware and software in perfect sync, a marriage made in data heaven.

That time is now.

Our data lake is now hosted in the cloud on Microsoft Azure, empowering us with elastic compute and hyperscale capability. Further, in line with numerous organisations that embarked on their big data journey between 2015 and 2020, we’re transitioning away from our Apache Hadoop stack in favour of a wholly native Azure-solution.

Azure everything.

This includes Azure Synapse for seamless data integration and analytics, uniting data warehousing and big data under one roof. Additionally, we employ Azure Purview to ensure a comprehensive and streamlined data governance experience.

Of course, Microsoft is just one of the major cloud vendors. Others include:

Amazon Web Services (AWS);
Google Cloud Platform (GCP);
Alibaba Cloud.

All of these powerhouses pour billions into improving their cloud infrastructure, products and services each year.

In short – unifying data, the data analysis tools to wrangle it, the data integration tools to combine it, and data governance tools to manage it— all under the same roof has created a big data experience as seamless as it can be in 2023.

4. Culture and Education… Culture and Education…

We’ve just seen three strategies focused on advancements in tooling technology.

I’ve got some news that might come off as a bit surprising:

The biggest challenge to becoming data-driven isn’t technology.

It’s the people, processes and culture.

You can have the best tech stack in the world, but a legacy culture resistant to data-driven insights will halt your data maturity journey. It’s crippling.

The reality of life is people are resistant to change.

That’s why fostering the right culture by providing ample opportunities to upskill and build data-driven communities is crucial.

My bank just launched an enterprise-wide Data & Digital Enablement program that offers all 40,000 employees the opportunity to learn about various topics, from data fluency to data leadership, via 8-week long pathways that comprise lectures, masterclasses, podcasts and videos delivered by internal subject-matter experts and industry figures.

We also hold an annual TechX conference that showcases emerging technologies and a Data Week event, filled with a week’s worth of exciting data-related talks.

And guess what? This year’s Data Week theme was Data for All.

Every single talk incorporated elements of data democratisation. And we spared no expense inviting executives from data leaders like Amazon, Alteryx, Dataiku, Microsoft, Google, etcetera, to speak to our employees.

Heck, our own Head of Data Architecture dressed up as an incredible ‘data hero’ to thump home the point that data is for everyone.

A data-driven organisation requires continuous upskilling and enablement of its people, supported by strong culture, strong communities and strong advocacy by top leadership.

There is no way around it.

5. Data Marketplaces

Finally, let’s talk about the data itself that powers analytics.

The irony inherent in the pursuit of data-driven insights is that the data’s in a terrible state in most companies, characterised by three major problems:

Lack of visibility. Where in the heck is all that data?
Lack of trust. Unreliable data means unreliable insights.
Lack of timeliness. Need some data ASAP? Too bad.

There’s a vast sea of data flowing around, but it’s like a mystery – no one can reliably pin down the specifics: what it holds, where it’s going, where it originated, or who owns the stuff. And don’t even get started on the quality and how it connects to other data. These pain points become serious showstoppers from the get-go when attempting to execute a data strategy to become a data-driven firm.

As I wrote in this piece on enterprise data architecture:

"Decades of data warehouses left organisations drowning in a sea of data systems connected by a mess of data pipelines. The magic solution was meant to be centralising data into a central repository. Unfortunately, the data lake dream devolved into data lake swamps across many organisations."

Firms have long approached data with a project-oriented mentality, leading to fragmented business teams creating isolated data pipelines whenever they need to solve some problem. The endeavour to consolidate data pipelines under a centralised data team within a big data lake swiftly encountered bottlenecks, resulting in an overwhelming sea of data that was challenging to access, comprehend, and riddled with persistent data issues.

Frustrating stuff for Chief Data and Analytics Officers (CDAO) worldwide.

In the 2020s, notable strides are being taken to overcome these challenges.

Companies are investing in a decentralised data mesh architecture that empowers individual business units to proudly craft reusable and trustworthy data products that can be seamlessly shared across the mesh to the entire enterprise.

These data products – like any proud product off a shelf – can be shopped and self-served by knowledge workers, promoting the supreme visibility of all strategic enterprise data assets.

At my bank, a Netflix or YouTube style data marketplace is being built, enabling employees to self-serve curated data products and other data assets… and publish their own assets.

With a polished UI that integrates social elements while providing swift access to key information like metadata, lineage and ownership information, we’re witnessing the rise of a centralised and polished one-stop data shop enables everyone in the organisation to come together to easily consume and publish data.

Like the 16th century Grand Bazaar of Tabriz in Istanbul, considered the Amazon marketplace of the day that attracted traders around the world, enterprise data marketplaces democratise data to everyone.

Suddenly, data assets become discoverable, quick and easy to access, and trustworthy. This dramatically lowers the cost to drive data insights, smart decisioning and advanced analytics capabilities like ML and AI.

Great stuff.

What’s Ahead? Some Final Words

The most impactful analytics has historically came from domain experts with a bit of analytics skills.

In the 1850s, John Snow employed geospatial maps to crunch the numbers and convince scientists and policymakers that the spread of cholera was connected to sources of sewage.

Snow wasn’t an ‘analyst’ or ‘data scientist’. He was a doctor.

During the same period, Florence Nightingale leveraged techniques resembling modern A/B test and utilised visualisations like pie charts to tease out the underlying factors influencing death rates for soldiers of war.

Recognised today as a pioneer in statistics, her ground-breaking work laid the foundation for numerous hygiene practices we adhere to today, such as washing hands and wearing masks.

Nightingale wasn’t a statistician. She was a nurse.

Over 170 years later, modern enterprises are being reminded of the value of democratising data and analytics to the masses as a prerequisite for achieving data-driven success.

This winning mentality indirectly solves a myriad of problems.

For instance, the best way to identify and address poor data quality at scale is through the exercising of that data. By democratising data access beyond the privileged few, firms are concocting a terrific recipe for enhancing their own data quality.

Put it another way, the more citizen data analysts are empowered to swim in their company’s data lake, the more rubbish they’ll clean up, which incentivises more colleagues to jump in.

Data democratisation kicks off up a virtuous cycle for success.

As time goes on, the synchronised uplift across multiple facets of data maturity – data, skills, tools and culture – will finally bring firms closer to their dream of becoming Level 4 or 5 analytics-driven powerhouses.

What’s ahead?

The technology landscape is being shaken hard by generative AI.

I expect Large Language Models to add an abstraction layer on top of current tools that’ll further simplify the user experience of leveraging analytics and data-driven insights for people from all walks of life.

Historically, technological progress has been all about abstraction layers.

Back in the day, computer pioneers were knee-deep in assembly code. You’d have to write this just to calculate 1 + 1:

section .data
    result db 0  ; Variable to store the result

section .text
    global _start

_start:
    mov al, 1    ; Move 1 into AL register
    add al, 1    ; Add 1 to AL

    mov [result], al  ; Store the result in the 'result' variable

    ; Exit the program
    mov eax, 1   ; System call number for exit
    xor ebx, ebx ; Exit status 0
    int 0x80     ; Invoke the kernel

Higher-level programming languages like C++, Java and Python swooped in after the 1980s, making our lives a whole lot easier. By 2012, statisticians could now prototype bread-and-butter machine learning models in ‘notebooks’ without needing much programming experience:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

# Train a logistic regression model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)

# Make some predictions and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

In 2023, enterprise tools like Power BI and Tableau, and managed machine learning platforms like Alteryx, Databricks and Dataiku can produce powerful analytics with a few clicks of a button.

Microsoft has unveiled Fabric, a SaaS cloud analytics platform that unifies warehousing, big data, data engineering and management under a single roof.

In a couple of decades, dedicated analytics platforms might become relics.

Get what you need by simply prompting what you want and watch the magic happen:

"Find correlations in my dataset."

"Run a stress test to simulate an interest rate hike of 5 basis points."

"Forecast current sales projections for the next 5 years."

"Analyse data quality and show me a report."

What can I say – it’s a real exciting time to be in analytics and AI.

Find me on Twitter & YouTube here, here & here.

My Popular AI, ML & Data Science articles

AI & Machine Learning: A Fast-Paced Introduction – here
Machine Learning versus Mechanistic Modelling – here
Data Science: New Age Skills for the Modern Data Scientist – here
Generative AI: How Big Companies are Scrambling for Adoption – here
ChatGPT & GPT-4: How OpenAI Won the NLU War – here
GenAI Art: DALL-E, Midjourney & Stable Diffusion Explained – here
Beyond ChatGPT: Search for a Truly Intelligence Machine – here
Modern Enterprise Data Strategy Explained – here
From Data Warehouses & Data Lakes to Data Mesh – here
From Data Lakes to Data Mesh: A Guide to Latest Architecture – here
Azure Synapse Analytics in Action: 7 Use Cases Explained – here
Cloud Computing 101: Harness Cloud for Your Business – here
Data Warehouses & Data Modelling – a Quick Crash Course – here
Data Products: Building a Strong Foundation for Analytics – here
Data Democratisation: 5 ‘Data For All’ Strategies – here
Data Governance: 5 Common Pain Points for Analysts – here
Power of Data Storytelling – Sell Stories, Not Data – here
Intro to Data Analysis: The Google Method – here
Power BI – From Data Modelling to Stunning Reports – here
Regression: Predict House Prices using Python – here
Classification: Predict Employee Churn using Python – here
Python Jupyter Notebooks versus Dataiku DSS – here
Popular Machine Learning Performance Metrics Explained – here
Building GenAI on AWS – My First Experience – here
Math Modelling & Machine Learning for COVID-19 – here
Future of Work: Is Your Career Safe in Age of AI – here

Data Democratisation: 5 ‘Data For All’ Strategies Embraced by Large Companies

1. All-in-One Analytics & Auto-ML Platforms

2. Combined Productivity & Analytics Ecosystems

3. Mature Big Data Tools

4. Culture and Education… Culture and Education…

5. Data Marketplaces

What’s Ahead? Some Final Words

My Popular AI, ML & Data Science articles

Related Articles

Implementing Convolutional Neural Networks in TensorFlow

How to Forecast Hierarchical Time Series

Hands-on Time Series Anomaly Detection using Autoencoders, with Python

3 AI Use Cases (That Are Not a Chatbot)

Solving a Constrained Project Scheduling Problem with Quantum Annealing

Back To Basics, Part Uno: Linear Regression and Cost Function

Must-Know in Statistics: The Bivariate Normal Projection Explained