MySQL is one of the more popular flavors of SQL-based databases, especially when it comes to web applications. Owned by Oracle, MySQL is free and open source, so it’s a great place to start if you’re looking for something to handle transaction processing and the other bits that underpin modern web apps. With MySQL, it’s easy to create a lightweight, fast database that is optimized for accessing and processing single source data. So if you’re, say, Wordpress, MySQL makes for a great way to store, access and process the data that is relevant to each of your users’ publications.
If you’ve found your way here, though, you’re probably interested in doing trying to incorporate some serious data analysis into your company’s workflow. Today, we’re going to look at how MySQL performs on analytics tasks, and whether it’s the best choice for a data warehousing project. We’re not going to waste your time beating around the bush, though: we don’t think MySQL databases make for very good data warehouses, and we’ll give you a few good reasons why we feel that way right up front.
Why dedicated data warehouses are better than MySQL databases when it comes to analysis
While MySQL is great for making snappy transactional databases, it’s not great when it comes to doing serious analytical work, especially with multiple data sources and large datasets. Once you find yourself in a position of trying to do deep analytical tasks that integrate multiple sources of data, you’re probably going to want to look at setting up a data warehouse. There are multiple reasons you might want to develop a data warehouse as part of your data practice, but here are some of the biggest advantages of using a dedicated analytical data warehouse instead of a standard database:
- You won’t slow your production database: Your production database(s) won’t have to be doing double duty, processing transactions and analytical queries at the same time
- You’re less likely to screw things up: Using your production database as an analytical database is theoretically fine, from a data integrity standpoint, as long as everyone involved knows what they’re doing and doesn’t tend to screw things up. But we all know that accidents happen, and it can cause serious problems if an analyst borks your production database trying to figure out how many widgets were sold last quarter.
- Data warehouses are designed for analytics: With a data warehouse, it’s a whole lot easier to integrate all your data in one place. But that’s not even the optimization part. Data warehouses are set up differently from normal databases: they use online analytical processing (OLAP) frameworks, which means that they’re optimized for quickly processing complex queries that combine data from multiple large, historical data sets. Of course, you can still do this with a standard MySQL database, but it will probably be harder and slower.
MySQL data warehousing and analytics: specific drawbacks and disadvantages
We covered some of the general points to take into consideration when deciding whether to use a dedicated data warehouse or go the YOLO route and just do analysis on your existing database(s), but now we’re going to take a closer look at the specific drawbacks of trying to use a MySQL database as an analytical database.
- Architecture: MySQL is built on the InnoDB engine, which is very well optimized for on-line transaction processing (OLTP). That means that it’s very good at maximizing the number of transactions per second in your database, which is great for supporting apps. On the other hand, this means that it is very specifically not good at running low-volume but computationally intense analytical queries that need to aggregate data across multiple tables, time periods, and data types.
- SQL dialect: This is less of an issue than it used to be, as MySQL’s particular flavor of SQL is pretty similar to most other leading dialects, but MySQL still lacks some specific functionalities that make analysis easier in other versions of SQL (e.g. Postgres).
Easy MySQL analytics: the Panoply way
If we’ve managed to steer you away from the idea of trying to do data analysis on a MySQL database, but left you wondering wondering, “what now?”, we have a solution for you. Panoply makes it easy to pull your MySQL data into a custom, automated data warehouse with just a few clicks. We’ve got a lot of documentation that can show you exactly how easy it is, so we won’t repeat ourselves here as far as the basics go. What we will do, though, is walk through the data structures that Wordpress creates in its MySQL backend, look at how to integrate that data with Google Analytics, and explore how to write SQL queries to answer questions about how your posts are performing and who your visitors are. Check out our next post for a quick tutorial on how to do all that using Panoply and a few basic queries in SQL.