So you're at the stage where you're thinking about your company's data management strategy. Good thinking: A McKinsey report estimates that businesses that leverage customer analytics are 23x more likely to acquire new customers faster than their competitors! And they're 19x more likely to have above-average customer profitability.
Say you already have a Microsoft (MS) SQL Server database. Couldn't you use your production database for your analytics and get that data-driven boost?
In this post, we'll talk about why a database is not a data warehouse and whether Microsoft SQL Server could work as a data warehouse.
Microsoft SQL Server
SQL Server is one of the more popular, albeit more expensive, relational database management systems. It's owned by Microsoft and features great database security; on-premise and cloud database support and integration for data extraction, transformation, and loading (ETL); server management; and analytics tools.
One thing that makes MS SQL Server very popular is that it plays very well with the .NET framework, so it's great for storing live data from web apps. Applications interacting with SQL Server use Transact-SQL, Microsoft, and Sybase's proprietary extension of SQL.
Database vs. data warehouse
TL;DR: Don't do analytics on your transaction database. A database and a data warehouse serve very different functions.
A relational database typically stores the most recent transactions in an atomized, consistent, isolated, and durable (ACID) manner. A data warehouse, on the other hand, generally aggregates and stores information from several databases across the company.
To optimize writing to tables, a database is often normalized. In other words, records don't contain duplicate information. The data is kept safe at all costs. You don't want to lose the record of your sales, for example! In short, databases optimize for Online Transaction Processing (OLTP) and are designed for data creation, insertion, updating, and deleting.
Unfortunately, the focus on transaction integrity means read operations come at a cost:
- Databases aren't optimized for performance.
- It's hard to aggregate data.
- Read operations will add load to your operational database.
You don't want to add load to your database because you don't want your customer transactions to fail just because someone ran a slow query. And as the analyst potentially running that query, you don't want the possibility of that hanging over your head.
Data warehouse uses
Whereas databases optimize for OLTP, data warehouses optimize for Online Analytical Processing (OLAP). Data in a data warehouse is typically not normalized, unlike in a database. The data is merged from multiple sources, and information is replicated across rows to make complicated queries easier.
As an analyst, you want access to historical records to be able to identify trends or make recommendations. However, a normalized database structure isn't ideal for making complex queries. For instance, you may have to join many tables to link the fields you need. This is a complicated task for analysts who are more business-minded.
And even if you are a savvy SQL writer, it's cumbersome to join five tables just to get a customer's address. You could do these multi-table queries on a database, but it's slower and more difficult to do. (It could also adversely affect your users' experience.)
Plus, it's good to separate daily transactions from data for analysis. You don't want to accidentally delete or drop anything from your production database! And your database performance (and budget) shouldn't have to suffer because you need to run computationally intensive queries.
Queries that affect uptime or latency are a big problem on the business database. Running those same queries on a data warehouse, you don't affect the goings-on of the business. At worst, you can write a query that muddles your historical records.
Hence the need for a data warehouse. You can transfer historical data to a data warehouse so that the production databases stay reasonably small.
With a data warehouse, you can easily do in-depth analysis:
- Get customer ratings for all products to make personalized recommendations.
- Get product sales over time by location, along with each location's details, to drill down into sales trends.
- Discover factors associated with customer churn.
The pros & cons of MS SQL Server as a data warehouse
Now that we've established why you need a separate data warehouse to complement your database, how do you get started? And more to the point, can you just use MS SQL Server as a data warehouse?
You could just replicate your database schema to create a data warehouse in MS SQL Server. MS SQL Server has some useful built-in functionality to do analysis, automated reports, and machine learning with Python and R. SQL Server also offers good data protection and security.
A few years ago, SQL Server introduced the Managed Data Warehouse, though its functionality is limited to database performance monitoring.
The biggest downside to using MS SQL Server as a data warehouse is expense. The subscription for the Enterprise edition of SQL Server 2019 starts at roughly $5,400 per year for two cores, and you'll have to pay extra for additional features and cores. A minimum of four cores are required per processor.
The licensed Enterprise version, which includes all features, starts at roughly $14,000. The licensing process can also be headache-inducingly complex. If your start-up grows rapidly, you could find yourself locked into runaway costs as you need more and more cores to maintain performance.
Secondly, you may not want to tie yourself to the Microsoft stack. MS SQL Server plays best with .NET (C#). Although SQL Server has expanded its operating system compatibility to Linux and added language support for Python, Java, and R, this support is still relatively new.
It also takes considerable effort to get data from a NoSQL (i.e., non-relational) database like MongoDB or Apache Cassandra into a relational database like MS SQL Server. If you store, or anticipate storing, some of your data in a NoSQL database, you will have to write extensive ETL pipelines to merge the data into a data warehouse. If this is the case, you may want to consider managed plug-and-play services that allow you to connect SQL and NoSQL databases with your favorite business analytics tools.
Lastly, SQL Server has complex performance tuning features. You need to understand these features well to use them effectively. If you're a smaller company looking to grow quickly, a plug-and-play solution like Panoply is faster and often cheaper than building a data warehouse yourself, especially if you can't dedicate engineers to creating or maintaining the data warehouse.
In sum: MS SQL Server isn't a data warehouse
Hopefully, we've convinced you that you shouldn't go without a data warehouse. For one thing, you can make data analytics and complex queries easier by merging your databases into a data warehouse. By separating your data warehouse from your database, you also minimize the risk of anything happening to your real-time business data.
You can use SQL Server as your data warehouse but be careful of runaway costs. It can also be difficult to tune your data warehouse performance and merge data from different database types in SQL Server.
If you want to grow quickly or you're an analyst who doesn't want to wade into the depths of data ETL, why not try an out-of-the-box solution? Panoply offers a free 14-day trial and no-code integration of your databases and business intelligence tools.