This content, written by Frank Bien, was initially posted in Looker Blog on Feb 3, 2017. The content is subject to limited support.
People, especially in tech and data, love buzzwords. “Data democracy”, “data-driven”, “Big Data.” There's been a lot of hype, for sure, but I think we in the analytics industry have yet to deliver on the promise.
Talk to even the most data-oriented organizations and they know there’s more value in their data than they’re currently getting.
But I think we're on the verge of something big. Trends are converging right now that could signal a substantial step forward--a whole new phase of BI: the Third Wave.
A little history
The First Wave of BI consisted of monolithic stacks like Cognos, MicroStrategy and Business Objects. It was the early 1990s, so these systems evolved in a world of databases that were built for transactions--for inserting, updating and deleting records--not for analytics. And they were very expensive, so you needed to carefully manage utilization.
In order to analyze your data, you took little slices out of these expensive database and put them into a specialized BI cubes and caches that were part of the monolithic stacks. These slices only let you answer a constrained, specific sets of questions. So any changes required help from IT--who would go back to the beginning and start the process of slicing and cubing from the beginning.
As a result, most resources went toward answering mission-critical questions--what are the financials at the close of the quarter? How is the sales pipeline looking? Is inventory sufficient? The only people who could get ad-hoc answers were the “rich,” with Cs in their titles. Everyone else had to wait in a line that just kept growing. In essence, you got a breadline, but for data.
The data team has too many mouths to feed. Overtaxed analysts prioritize work for company executives, and everyone else must be served later. As a result, employees have to wait for critical answers, slowing the company’s progress. Companies try to cope by hiring more analysts, but employees’ hunger for data is insatiable. Eventually, employees get tired of waiting in line and make decisions without data.
The First Wave wasn’t without advantages. Because data was so locked down, the answers you got were 100% accurate--data was well governed. But every change or modification took months. The systems were designed to maintain consistency, but that made them inflexible and meant they couldn’t keep up with business needs that evolved faster and faster.
Around 2000, some people saw this pain and thought “Huge Market Opportunity!” That was the advent of the Second Wave of BI: Self-Service. If business users could take care of their own data needs, you wouldn’t need to rely on data people or those slow, expensive databases at all.
Sounds amazing, right? Unfortunately, the reality was a little different.
When a business person was working with data they understood intimately, these systems worked fine. But when the data was complex or unfamiliar, bad things happened. The Second Wave tools were basically Excel on steroids. They lacked the governance of the First Wave tools, so any data that you didn’t personally understand was out of reach for reliable analysis.
“Self service” set out to make data professionals obsolete but, ironically, instead made them even more critical. All the data curation and cleanup that used to live in the monolithic stack now had to be done manually each time a business user wanted to load data into their “self-service” tool. The supposed solution for the breadline only made it worse.
To make matters worse, as soon as you cut the cord from the central warehouse and pulled data onto your local self service tool, you lost the logic of the business. So now, you had to define your key metrics (“who is a customer?”, “what components go into net revenue?”) from scratch for each project. The metadata lived in the monolithic stacks and couldn’t be extracted or reused, throwing the whole logic of the organization up in the air.
The Second Wave of BI is characterized by this lack of standard definitions and metrics. Even the data itself became unreliable. The result? Data Brawls.
Data--and understandings of what the data means--fragment, leaving each analyst to invent their own metrics and analyses. When teams come together to make a plan, they end up shouting, yelling, and laboring over figures that just don’t seem to align. Each team is looking at different slices of the data, and coming to their own interpretation of what has happened, so they each have radically different ideas of what the company should do next.
The third wave
That brings us to the present. Most people are either stuck in a breadline or a brawl (or maybe even a brawl in a breadline).
But the key constraint that defined those first two waves--slow, expensive databases--is gone. The big data revolution brought a lot of hype, but one thing of value it delivered were fast, cheap analytic databases (many of which even migrated to the cloud). These MPP data warehouses and SQL-on-Hadoop systems are so fast and so cheap that you no longer have to extract your data to analyze it. You can do your analytics right in the database.
This technological shift is the cornerstone of the Third Wave: the Data Platform.
Today’s databases are so cheap, companies can put all their data--from every SaaS application, every custom tracking code, and every transactional database--in one place and query it at will.
What’s more, these databases are so fast, you can transform data AFTER it’s in the database, eliminating much of the heavy lifting of traditional ETLs. This radically shortens the time it takes to get from a new question to an answer. All the data is already in one place, it’s always fresh, and just one tool integrates data from different sources and governs the meaning of that data.
That means data professionals can stop answering one-off requests and leverage the data platform to curate a model. A model isn’t static like a report, it’s a dynamic environment that lets non-technical users ask whatever question they want, knowing that the platform will translate their question into the correct analytic query and deliver the right answer.
Data platforms allow business users to explore in a truly self-service way, with data governance baked right in. What’s more, modern software practices mean that in a data platform, business processes dictate the tools, not the other way around.
Account managers who already spend all day in Salesforce can integrate reporting into that workflow, and determine which customers are in trouble and which renewals need attention based on support tickets and usage data. No breadlines. No brawls.
I really think this third wave is the holy grail--the thing that’ll let companies unlock all the value they know is in their data. The Third Wave of BI will finally be the thing that lets us deliver on the promise of being data-driven; of driving data culture. I know because, every day, I’m seeing companies put the right data in front of the right users at the right moment, and seeing the better decisions that result.