What’s an analytics database? Is it the analytical, meticulous Sherlock Holmes version of a database? In a way, it is. Just as the great detective observed and analyzed clues without touching a crime scene, an analytics database runs separately from your production database to process facts and provide insight.
Beyond First-Gen Analytics Databases
The first breakthrough generation of online analytics processing (OLAP) databases - Vertica, Teradata, etc. - provided unprecedented data access and the foundation for today’s data warehouses. What’s the biggest strength of an OLAP database? It’s great at answering BI-type questions that support tactical decisions.
Now that so many companies are onboard with OLAP, though, the bar has been raised, Higher expectations and more complex business environments have brought the limitations of First-Gen analytics databases to light.
Static report queries no longer cut it if you want to stay competitive. You’ve going to want more complex analysis and versatile (tactical and operational) query power. And you want more frequent data updates. And you want it all faster.
Say hello to a leaner, more flexible data warehouse architecture – complemented by falling hardware prices. It’s optimized to make queries fast while providing low maintenance and bountiful scalability. The popularity of this new-generation data warehouse is on the rise, most notably in best-of-class businesses.
Four Simple Questions
How do you follow the leaders? Four simple questions can get you on track to select an analytics database.
1. Spreadsheet or Doc?
The first question to answer is the nature of your data. Do you picture it happily plunked into a spreadsheet, or more comfortable in a document form?
If you see data with clear, logical connections – the equivalent of rows and columns in a spreadsheet – you’re thinking like a relational database. Relational databases are great when you know how each piece of data fits (relates) with all your other data. MySQL, Amazon Redshift, BigQuery and PostgreSQL are all good relational database choices.
If you see data with less logic and more flow, like a document, you’re thinking like a non-relational database. Do you need analytics on material such as email, podcasts, social media, GIS info, and reports? That kind of analytics is best handled by a non-relational database that offers lots (maybe millions) of data points to mine. Apache Hadoop and MongoDB are two solid non-relational database choices.
2. How Much Data?
After you consider your type of data, you need to figure out how much data you’ll be analyzing. As a rule, non-relational databases work best with larger amounts of data. Non-relational databases aren’t structured in the rigid column-row design of relational databases, so they can read in and write large amounts of data faster. Here are more guidelines, but they are just guidelines, not rules.
Database performance depends upon many factors, including the skill sets of the DBAs you have working under the hood.
If you want to optimize price-to-performance for less than 1 terabyte (TB) of data, PostgreSQL is good. When your data size reaches 6 TB, you’ll probably feel the need for improvement, perhaps to MySQL. If you’re using PostgreSQL or MySQL and the performance is lackluster, Aurora (Amazon’s cost-effective relational database) efficiently handles up to 64 TB. Beyond that, say for a petabyte (PB) data level, look at Amazon Redshift. Redshift is solid up to around 2 PB. If you need even more processing power, you can ramp up to Apache Hadoop.
3. How Much Engineering Support Do You Expect?
How much time can your engineering team give you? If your shop is small and your team is 110% occupied with supporting your business product, that should influence your database selection. Consider both short term and long term support. Perhaps you can gain support for set-up, but those resources won’t be left in place for long-term maintenance.
If you expect ongoing engineering support, then let the data type (like discussed above) guide your choice of a relational/non-relational database purchase.
If you expect in-house help with set-up, but not maintenance, you can explore hosted services such as the Panoply Smart Cloud Data Warehouse. If you don’t go the hosted route, just remember that relational databases are lower-maintenance than non-relational. As a significant added bonus, the SQL language makes them accessible to a wide range of staff. Basic SQL commands are easily learned and widely known by programmers and analysts. In contrast, running queries on less structured (non-relational) databases requires more of a programming or data science background.
4. What is Your Need for Speed?
Even though we’d all love the thrill of having real-time data, most use cases don’t call for it - or its price tag. If you’re analyzing large amounts of data to discover trends or causal relationships, a small time lag won’t make much difference. A significant relationship doesn’t change much over an hour or a day. So, take your pick of relational databases that are designed to do just what you need, quickly analyze after-the-fact data. Redshift and BigQuery are both optimized to analyze large amounts of data.
If your field is fraud detection or another area that requires real-time analysis, a non-relational database offers the advantage mentioned above: It reads data in quickly, keeping query answers up-to-date.
Even Dr. Watson could deduce that relational analytical databases serve most users best most of the time. If you want a fast and affordable system for a small (less than 1 TB) amount of data, turn to PostgreSQL - for a larger store of data, Redshift. Both PostgreSQL and Redshift provide the easy query power of the SQL language. BigQuery is a powerhouse for larger data stores but uses its own unique comma syntax that’s frustrating to folks who don’t appreciate learning two flavors of SQL.
If these and other complications (and there are more!) seem like too much of a drain on the energy you need to actually run your business, there’s help. A data warehouse, such as Panoply, can do the heavy lifting for you. Data warehouses help you upload, maintain, and analyze data without putting a load on your IT resources.