Data Warehouse Selection Criteria: How To Choose the Right Storage

Written by Anders Schneiderman | Feb 22, 2021 1:20:00 PM

You've been thinking for a while that it's time to level up your data solution. You could keep muddling by. But you're spending too much time grinding through individual requests. It's 2021; shouldn't your users be able to take care of more of their needs by now?

From what you've been reading, you know you should develop a data warehouse. But you're not sure how to decide which data warehouse platform would best fit your business' needs.

In this article, I'll help you figure out what criteria you should use to choose your new data platform and how to ensure it will help your business thrive.

Data warehouse evaluation criteria

While the specifics are, well, specific to every company, there are six key criteria to keep in mind when choosing a data warehouse:

Cloud vs. on-prem
Tool ecosystem
Implementation cost and time
Ongoing costs and maintenance
Ease of scalability
Support and community

In many cases, these criteria are really trade-offs— for example, a data warehouse that's quick to implement may be a pain to scale. But you'll be better prepared to make the right decision if you understand what you're getting into before you buy.

Cloud vs. on-premise storage

Even as recently as a few years ago, you might have struggled with whether to go with a cloud-based or on premises approach. Today, the battle is over.

There are a few circumstances where it still makes sense to consider an on-prem approach. For example, if most of your critical databases are on-premises and they are old enough that they don't work well with cloud-based data warehouses, an on-premises approach might be the way to go. Or if your company is subject to byzantine regulatory requirements that make on-prem your only choice.

On-prem might also make sense if your CEO's brother-in-law's nephew claims he's an on-prem genius and if the CEO has to quickly find him a job because otherwise the CEO's marriage won't survive? Or if your CEO is being blackmailed by an out-of-work on-prem consultant who's got photos that must remain hidden at all costs? Then on-prem might be a good fit. Or at least that's what you should tell people while you search for a new job.

Otherwise, why would you want to take on all the infrastructure headaches of on-prem work, especially when it's clear that all of the major vendors—yes, even stalwarts like Oracle—are trying hard to leave on-prem in their rearview mirror?

Data tool ecosystem

If you work at a company that’s already heavily invested in a data tool ecosystem and doesn't have a lot of data sources residing outside of it, you're probably going to pick that ecosystem's tool.

For example, if you're at a Microsoft shop and most of the systems requiring a custom integration have a SQL Server backend, odds are you'll decide to develop a Microsoft data warehouse in Azure because it's just plain convenient. And it's hard to argue with that.

But my suspicion is that if you're reading an article about how to choose a data warehouse, that's probably not your story.

Data warehouse implementation

They say the devil is in the details, and that’s doubly true when it comes to data warehouse implementation. Here are some of the finer points you should consider:

Cost

When deciding between data warehouse tools, money is often a major driver.

Unfortunately, figuring out the difference in price between several data warehouse platforms can be painful. Vendors use radically different approaches to how they calculate how much a specific configuration of computing power, storage, etc. will cost. (If you want some help figuring things out, check out our article on data warehouse pricing structures.)

So while you should certainly read the pricing info that each vendor posts on their website, focus on finding out from people in your network how much they paid for set ups similar to the one you will need.

Speaking of cost, the other cost you'll need to factor in is how much you will have to spend on people. For example, can you handle all the implementation with your current headcount or will you need to hire or bring in a consultant to manage the project? The cost may not come from the same line item as your warehouse, but it's definitely related.

Time

Cost matters, but time often matters more—especially for startups that are trying to move as quickly as they can.

If one data warehouse costs moderately less than another but it takes five more months to implement, that's five months of not getting the insights your company needs to outwit the competition.

And when you’re weighing implementation time, don’t forget to factor in opportunity cost...more on that in a minute.

Ongoing storage costs and maintenance

In addition to the costs of getting started, you'll also need to take into account ongoing costs—which sometimes can be substantially higher than the resources you allocate at the beginning.

There are several ongoing costs you'll need to consider:

Storage and compute: As your data and usage grows, so will your monthly storage bill. Having a good sense of how your costs will rise over time is key.
People: You’ll need staff time to keep the system running smoothly—e.g., doing chores like vacuuming or performance tuning—and to add new data sources, build out the data model as your business needs evolve, etc.
Opportunity costs: Tackling your own data warehouse maintenance can be a real time suck. The question to ask when you dedicate resources to data warehouse maintenance, is what else aren't you building? And the longer it takes to make enhancements to the data warehouse, what insights are you missing out on?

The reality is that most people invest in a data warehouse for the long haul, not just a couple of quarters. Giving some careful thought not just to where you are now but where you’ll be six months from now can help you make a choice that makes sense both now and in the future.

Ease of scalability

If you’re part of a fast-growing business, one of the things you want to find out is, what's involved in scaling up your data warehouse?

To figure that out, first you need to get a rough sense of what your current business needs are, including how much data you currently have, how quickly your needs are likely to grow, and how much confidence you have in your assessment of your needs for scale.

Then start asking vendors questions about how much it costs to expand and where the breakpoints are. The more incremental the cost to expand is, the less you're likely to end up spending money on capacity you don't need.

The other factor you need to take into account are staffing costs. For example, some data warehouse systems require a lot of monitoring to ensure their capacity expands to meet your needs (and heaven forbid that doesn’t happen in time). Others, such as Panoply, automatically spin up new clusters/nodes as you need them without any intervention on your part.

One last point about scalability. Don't go overboard in your concern about scaling up. It's easy to spend so much time trying to build in scalability you don't currently need that you aren’t able to move fast enough to grow to the point where that scalability is needed (aka, you experience what we call the scalability paradox).

Support and community

When you run into trouble with your data warehouse, how likely are you to get the help you need when you need it? While no one chooses a data warehouse tool based solely on the support they can get, if two data warehouse systems are pretty equal, it could be the deciding factor.

When evaluating a tool, make sure to take time to check out the online support community to see what kind of help you can expect. And find out if there is live support you can contact...and whether it’s included in your pricing tier. You might be ok with the idea of using documentation to handle most of your support issues, but having a real person on the other end of the line can be a real lifesaver when you need it.

The easy choice: Save time & headaches with Panoply

And now for a shameless plug: If you're thinking about developing a data warehouse, you should take a serious look at Panoply. Panoply just works—and getting started is easy. Because Panoply has a ton of built-in integrations with many popular online systems, you can have the first version of your data warehouse up and running in just minutes.

Plus, Panoply automatically scales as your data needs grow, so you never have to worry about losing critical data points. Plus, Panoply also offers first-rate support that our customers praise as “unbelievably responsive.” Schedule a personalized demo today to find out more and to get any of your lingering questions answered.

View full post