What’s an analytics database? Is it the analytical, meticulous Sherlock Holmes version of a database? In a way, it is. Just as the great detective observed and analyzed clues without touching a crime scene, an analytics database runs separately from your production database to process facts and provide insight.
Beyond first-gen analytics databases
The first breakthrough generation of online analytics processing (OLAP) databases—Vertica, Teradata, etc.—provided unprecedented data access and the foundation for today’s data warehouses. What’s the biggest strength of an OLAP database? It’s great at answering BI-type questions that support tactical decisions.
Now that so many companies are onboard with OLAP, though, the bar has been raised. Higher expectations and more complex business environments have brought the limitations of first-gen analytics databases to light.
Static report queries no longer cut it if you want to stay competitive. You’re going to want more complex analysis and versatile (tactical and operational) query power. You'll want more frequent data updates, and you'll want it all faster.
How to choose an analytics database
How do you follow the leaders? Four simple questions can get you on track to select an analytics database.
1. Spreadsheet or doc?
The first question to answer is the nature of your data. Do you picture it happily plunked into a spreadsheet, or more comfortable in a document form?
If you see data with clear, logical connections—the equivalent of rows and columns in a spreadsheet—you’re thinking like a relational database. Relational databases are great when you know how each piece of data fits (relates) with all your other data. MySQL, Amazon Redshift, BigQuery and PostgreSQL are all good relational database choices.
If you see data with less logic and more flow, like a document, you’re thinking like a non-relational database. Do you need analytics on material such as email, podcasts, social media, GIS info, and reports? That kind of analytics is best handled by a non-relational database that offers lots (maybe millions) of data points to mine. Apache Hadoop and MongoDB are two solid non-relational database choices.
2. How much data do you have?
After you consider your type of data, you need to figure out how much data you’ll be analyzing. As a rule, non-relational databases work best with larger amounts of data. Non-relational databases aren’t structured in the rigid column-row design of relational databases, so they can read in and write large amounts of data faster.
3. How much engineering support will you have?
How much time can your engineering team dedicate to data tasks? If your shop is small and your team is 110% occupied with supporting your business product, that should influence your database selection. Consider both short term and long term support. Perhaps you can gain support for set-up, but those resources won’t be left in place for long-term maintenance.
If you expect in-house help with set-up, but not maintenance, you can explore hosted services such as Panoply. If you don’t go the hosted route, just remember that relational databases are lower-maintenance than non-relational.
As a significant added bonus, the SQL language makes data accessible to a wider range of staff. Basic SQL commands are easily learned and widely known by programmers and analysts. In contrast, running queries on less structured (non-relational) databases requires more of a programming or data science background.
4. How much speed do you need?
Even though we’d all love the thrill of having real-time data, most use cases don’t call for it—or its price tag. If you’re analyzing large amounts of data to discover trends or causal relationships, a small time lag won’t make much difference. A significant relationship doesn’t change much over an hour or a day. So, take your pick of relational databases that are designed to do just what you need, quickly analyze after-the-fact data. Redshift and BigQuery are both optimized to analyze large amounts of data.
If your field is fraud detection or another area that requires real-time analysis, a non-relational database offers the advantage mentioned above: It reads data in quickly, keeping query answers up-to-date.
Our deduction?
Even Dr. Watson could deduce that relational analytical databases serve most users best most of the time. If you want a fast and affordable system for a small amount of data, turn to PostgreSQL, but for a larger store of data, Redshift. Both PostgreSQL and Redshift provide the easy query power of the SQL language. BigQuery is a powerhouse for larger data stores but uses its own unique comma syntax that’s frustrating to folks who don’t appreciate learning two flavors of SQL.
If these and other complications (and there are more!) seem like too much of a drain on the energy you need to actually run your business, let Panoply do the heavy lifting for you. Book a personalized demo today.