In today's world, data warehouses are the foundation of business intelligence.
If you're running a business and don't have a data warehouse, it can feel like you're missing out. But, implementing a data warehouse to your infrastructure can sound like a scary project.
Adding all of your data in one place and figuring out how to effectively query it and then draw inferences from the information you collect? There are about a dozen different ways that can go wrong before you even get into the cost of a project like that.
Below I'll go over some steps you can take to ensure the success of a data warehousing project.
When building a data warehouse, data isn't actually what you're after. You're really looking for information and to glean useful insights from the data you already collect.
To do that, you need to colocate that data (in a system, not necessarily physically). Then you'll need to query pieces of data that previously weren't connected to one another to determine how those data are connected.
One of the most common causes of failure in data warehousing projects is not understanding what you're looking for. So, before you start a project this big, you need to understand the insights you're trying to find.
When building a data warehouse, data isn't actually what you're after. You're really looking for information and to glean useful insights from the data you already collect.
That doesn't mean that you need to know every question to ask before you start. One of the benefits of querying your aggregated data is that you'll often discover new questions as you explore.
You'll be surprised at what you learn about the performance of your business, and as you learn new things, you'll find new questions to ask.
All of that is terrific, but before you start, you need to have some objectives for what your business will get out of the data warehouse.
You want to be more specific than just "more profit" because if that's your only goal, you'll have a tough time determining if the project is successful.
Once you know what you're trying to find out, you need to know where decision-makers currently source their data. That doesn't just mean databases!
Often, even in large companies, you'll find data from unlikely sources. Sometimes it's a key spreadsheet maintained on a single person's computer or even an email inbox where sales leads originate.
The goal of a data warehouse is to get all of your data into one place, but a lot of warehousing projects only think about the "big" data.
The tech teams know that those are the places with the largest problems to solve. It can be tempting to fixate on the big problems, but integrating "smaller" sources of data is just as critical.
When identifying your team's data integrations, it's important to not just focus on the big data sources. You need to make sure you're looking for integrations in places like:
Those tools (and many others) contain data that's critical to understanding and growing the performance of your business.
During this discovery phase, you want to be as thorough as possible. Ask decision-makers to share everything they use to track their numbers. Then ask them again.
Once you know the questions you're trying to answer and where your data comes from, it's time to start getting data into your warehouse. If you've been planning a data warehouse project, this is probably the headache you've been dreading.
For legacy data projects, getting your data in the data warehouse meant a complicated and challenging ETL process for each individual source of data. Those processes were fragile, challenging to build, and very difficult to QA. Given the criticality of the data they transported, it meant that businesses spent a lot of time and money to get them right.
If you've done a data warehouse integration using legacy ETL processes before, the phrase "data warehouse" probably gives you cold sweats.
The good news is that integrating data sources is a lot easier today with tools like Panoply.
Instead of building a costly and fragile custom ETL process for every data source, Panoply has dozens of pre-built integrations for common data sources. It's likely that the tools you use are already integrated into the Panoply toolbox. Plus, Panoply integrates with an array of partner ETL tools, so you should be able to find the connector you need.
Once you have data in your warehouse, it's time to start modeling it.
When you pull data into a data warehouse, those data still exist as isolated chunks. You'll need to start thinking about the data holistically, combining attributes and details from multiple sources into one concept.
It's likely that you already do this!
More than likely, you don't think of "Salesforce data" and "customer support data," instead you think of "data about customer interactions."
But your data warehouse doesn't see it that way.
This step is all about transitioning from using your ad hoc, disconnected data sources and teaching your warehouse to associate them so that they fit your mental models.
The key to building your data model is to remember that you're representing the way you already think about things.
You don't need to get too complicated here. It's likely that you already know how to represent the data. You just need to teach the warehouse how pieces of information come together.
Once you have your data and you've taught the system how to think about it, you're ready to really start diving in.
This is where the rubber of a data warehouse project meets the road. It's also, not surprisingly, the longest portion of the project.
It's likely that your data warehouse will reap some immediate insights, but the most significant insights usually aren't floating right at the surface. Instead, they require extensive data analysis, and often some experimental data science work to truly bear fruit.
This is where it's important to remember those original key business objectives. Your goal here isn't just gathering data, It's improving the business.
This is an iterative part of the project and you'll likely be in this step for years as you learn about predictive data you never suspected could help your business. Then, you'll start to ask new questions, and you'll query the data to find out what you didn't know you already knew.
Over time, your teams will get better at this, and you'll get better at asking good questions. Effectively using your data warehouse is a skill, and it's one you'll get better at with time.
Many teams think of data warehouse projects like they're insurmountable technical mountains. But if you look back at our steps here, very few of them are technical in nature.
The hardest technical problem—integrating disparate sources of data—can be solved by integrating a tool like Panoply to bring your tools together.
The key to building a good data warehousing project is understanding what you're trying to understand and working with your teams to learn about how they understand things today.
As with so many software projects, the tech is secondary to good communication, solid planning, and wise application of the correct tools.
If you're considering a data warehouse project, getting started is easier than you think. Panoply is free to get started and contains dozens of integrations for data sources you already use.