Build or Buy? Panoply vs. Self-Managed Solution (By the Numbers)

Written by Adam Murphy | Oct 11, 2021 10:56:42 PM

The goal of this article is to present an unbiased view and play-by-play breakdown of how much it would cost to build and manage your own ETL pipeline and data warehouse vs. using Panoply.

This post is pretty long and detailed —and technical. But this is what’s needed to make an apples-to-apples comparison between tools.

We discovered when writing this article that it's challenging to compare data warehouse and ETL solution providers; they all have different pricing structures and a bazillion options. So, it takes a lot of time and effort to compare them.

Moreover, your costs today won't be your costs tomorrow.

You may choose a cheap Redshift node costing $0.33/hour to start and quickly realize they are too slow. So, you upgrade to a more powerful node for $1.20/hour; that's an instant 4x increase in your costs!

To account for this, we've given you a range of prices for each service to show how they may evolve over time.

Since every provider has a different pricing structure, we needed one to refer back to for comparison. Unsurprisingly, we chose Panoply - maybe we’re a little biased ;).

We've linked to all the pages we've used to get our data, so please feel free to check them to ensure our numbers are correct.

Lastly, we've stuck with the most prominent players in the ETL and DW space: Stitch and Fivetran for the ETL and Redshift and BigQuery for the data warehouse. We know there are other providers, but this article is long enough already.

We've also chosen monthly (rather than annual) pricing for all providers to make comparisons easier.

Who is this comparison for?

The information in this article will be most relevant if you are part of an early-stage start-up or SMB (20-50 people) and looking to set up your first data warehouse.

Larger businesses probably already have data warehouses, and then the question is one of migration.

Pros and cons of building vs. buying

The idea of building a custom ETL pipeline and data warehouse vs. buying it off-the-shelf has changed in recent years.

Before 2015, building meant you would hire a team of data engineers to write ETL code that pipes data into your (perhaps on-premises) data warehouse.

Nowadays, no one should be writing ETL pipelines. They will break, and you will waste hundreds of hours of engineering time.

Many companies have spent millions of dollars creating production-grade pipelines for you, and you can use them for much less than the cost of your engineers’ time.

Today, the question is: which paid providers are you going to choose? In this article, we will compare the top ETL and DW tools on the market to Panoply.

Outline of a modern data stack (ETL + DW + BI) and why each piece is necessary

The modern data stack consists of three layers.

The first layer extracts all your data from disparate sources and loads it into one place, i.e., your ETL or data ingestion tool.
The second layer stores the data in one location as a single source of truth, i.e., your data warehouse (DW).
The final layer analyzes the data and enables you to extract insights, i.e., your BI tool.

There are many tools on the market for these three layers. You may even have more than one tool for each layer - you could use two BI tools, one for product analytics (e.g., Amplitude) and another for ad-hoc analysis (e.g., Mode).

We'll assume you need one tool per layer for simplicity, and for brevity, we will not compare BI tools.

We will only look at the difference between buying an ETL tool and a data warehouse separately or combining them with Panoply.

First, we'll look at the price; then, we'll compare their setup and maintenance costs.

Let's start with Panoply.

Pricing comparison

Cost of Panoply

Panoply combines an ELT tool with a data warehouse.

You get storage, and compute power, and unlimited Panoply data connectors.
You can connect dozens of popular data sources to your data warehouse using one of our native Snap Connectors, or other data sources using our Flex Connector, then store it instantly and start querying and analyzing right away.
You can set up everything within a matter of minutes and a few clicks.

Panoply offers four pricing tiers. The main differentiator between tiers is the amount of query bytes you can use each month, and varying levels of support. Each tier comes with:

A managed data warehouse
Unlimited Panoply data connectors
A large amount of storage
Unlimited users

Here is the pricing breakdown we will refer to throughout the article, and remember, this is on a month-to month contract:

Panoply pricing tiers

Pricing Tier	Rows per month	Storage	Price per month
Starter	2 million	1 TiB	$259
Lite	20 million	2 TiB	$649
Standard	100 million	3 TiB	$1039
Premium	300 million	5 TiB	$1559

Now let's look at the cost of self-managed ETL tools.

Cost of self-managed ETL tools

The two primary self-managed ETL tools are Stitch and Fivetran.

Both of these tools:

say you can get set up in minutes,
have 130+ integrations with all the most common data sources (and are adding more each day),
move data into your data warehouse, guarantee uptime, ensure your pipelines work and automatically update them after API and schema updates.

However, their pricing models are very different.

Stitch charges you based on the number of rows you ingest each month
Fivetran charges you based on the number of active rows you update each month.

With Stitch, updating one row five times counts as five rows ingested. With Fivetran, it counts as one updated row. Thus, if you update tables tens or hundreds of times each month, using Fivetran can be more cost-effective. Using Stitch for tables like that would quickly drain your monthly limit.

However, Fivetran is an order of magnitude more expensive than Panoply or Stitch for small volumes of data.

If you have 1 million active row updates each month, Fivetran costs $500/month, $750/month, or $1,000/month, or $1600/month depending on their price plan.

Note: We're comparing the 'standard' pricing tier for each company, which changes price depending on your level of rows. We're also using Fivetran's tool to estimate active monthly rows, based on total rows.

Stitch and Fivetran pricing comparison (Standard Tier)

Total Rows Month	Stitch	Fivetran
5 million	$100	$1842.05
10 million	$180	$3,157.80
100 million	$750	$9,868.13

Unsurprisingly, Stitch is cheaper than Fivetran at every pricing tier.

Conclusion: Cost of Panoply vs. cost of self-managed ETL tools

If you use Fivetran for anything, you will pay way more than Panoply. So Panoply wins here, hands down.
If you have a small amount of data (~10 million rows), Stitch is the winner as it is $209/month cheaper than Panoply.

However, once your data starts to scale, the difference between the two options (Stitch and Panoply) decreases dramatically.

Panoply is $286 more expensive for the standard option (for 100 million rows - Stitch standard pricing depends on rows ingested) but costs $941 less on the most Premium tier (although the Stitch Premium tier offers 1 billion rows/month). Knowing this, imagine how expensive it gets buying Stitch PLUS a separate data warehouse!

One thing to note is that Stitch gives you slightly more control over how many rows you can ingest each month. For example, you can pay $100-$2500/month to consume 5, 10, 25, 50, 100 million, or 1 billion rows rows with Stitch, but Panoply offers just slightly fewer row options of 2, 20, 100, or 300 million rows at $259 - $1559/month gives you the option of 10 million or 100 million. In some of these cases, Stitch can work out to be cheaper.

In an apples-to-apples ETL comparison, Panoply is much cheaper than Fivetran and comparable to Stitch, depending on your selection, and achieves the same results as both.

Now let's see how much data warehouses cost.

Since most of you will want to ingest more than 10 million rows per month, the data warehouses need to cost less than $200/month to make financial sense compared to Panoply.

Cost of self-managed data warehouses

Redshift

Amazon says Redshift costs less to operate than any other data warehouse. This claim is, in part, because you can tinker with every aspect of your setup. However, we will only look at a simple one.

To use Redshift, you need to rent nodes, i.e., buckets of storage and compute power.

If you have less than 1 TiB of data, Amazon recommends you use DC2 nodes, for which you pay by the hour.
If you have more than 1 TiB of data, Amazon recommends you use RA3 nodes. For these, you pay separately for storage and compute. You pay for storage by the TiB and compute by the hour.

If you have less than 160 GB of data, you can survive with one DC2 node and choose the cheapest one, which costs $0.25/hour and give you 2 x CPUs and 15 GB of memory (not much, we know).

Running this setup for a full month costs 730 hours x $0.25 = $187.50/month.

Using this with 100 million rows on Stitch costs $750 + $187.50 = $937.5/month and is $101.5/month less expensive than Panoply’s standard tier at $1039/month.

As you scale, this setup becomes more expensive.

If you have 300 GB of data, you will need 2 DC2 nodes. Now your costs have doubled. You pay 730 hours x $0.25 x 2 nodes = $365/month. This number is only going to keep increasing as your data increases. Now Redshift and Stitch costs $750 + 365 = $1115 and are $76/month more expensive than Panoply.

When you put it this way, you can see that Panoply is less expensive than Redshift's DC2 nodes.

If you use RA3 nodes, you pay $27.10 per TiB per month for storage. Compute nodes cost at least $3.26/hour; for that, you get 12 CPUs and 96 GB of memory (much better!).

Now, let's calculate how many hours of compute you can use if you want to keep your data warehouse costs below $200/month.

First, we subtract storage costs: $200 - $27.10 = $172.90. So you have $172.90 / $3.26 = 53.04 compute hours per month. That is 14 compute hours per week or 3 hours per workday.

If your company can get away with just 3 hours per day, every day, every month, then RA3 nodes are cheaper than Panoply. If not (looking at you overnight batch jobs!), then Panoply also makes more sense financially than Redshift RA3 nodes.

Note that we are not including additional services Amazon offers, such as Spectrum ($200/month for 40 TiB of queries) or automated backups.

Panoply is the clear winner here.

For even more info on this cost analysis, check out our in-depth breakdown of Redshift pricing.

Now, let's look at one of the other prominent players in the Data Warehouse as a Service (DWaaS) industry: Google's BigQuery.

BigQuery

BigQuery pricing is split into 2 parts: storage and compute.

Storage costs $20 per TiB.
Compute costs $6.25 per TiB of queries.

This pricing model makes it easy to compare with Panoply. We will also add in Stitch costs for a complete comparison.

Storage and Compute

BigQuery + Stitch

Panoply

Winner

1 TiB storage

40 TiB compute

10 million rows

$20 + (40 x $6.25) +

$180

= $450

$499

BigQuery + Stitch

$49 cheaper

1 TiB storage

60 TiB compute

100 million rows

$20 + (60 x $6.25) +

$750

= $1145

$799

Panoply

$346 cheaper

1 TiB storage

80 TiB compute

200 million rows

$20 + (80 x $6.25) +

$1000

= $1520

$1199

Panoply

$321 cheaper

You can see that for small volumes of data, BigQuery and Stitch work out cheaper than Panoply by only $49/month.

But once you scale and start to ingest more rows, Panoply is the clear winner.

Winner of pricing comparison: Self-managed ETL + DW vs. Panoply

It is possible to create setups with Redshift and BigQuery that are cheaper than Panoply.

For Redshift, you need a small amount of data (less than 160 GB), and you have to use the cheapest and least powerful nodes. Once your data goes above 160 GBs, Redshift doubles in price and becomes more expensive than Panoply.

If you combine BigQuery and Stitch and ingest a small amount of data (less than 10 million rows) per month, it's slightly cheaper than Panoply by $99/month.

We recommend choosing BigQuery over Redshift in this case because BigQuery doesn't limit your processing power, and you get a full-scale warehouse behind you on day one.

In all other cases, Panoply works out cheaper.

Once you start ingesting more rows per month or performing more TiBs of queries (as you inevitably will), the costs of Stitch and BigQuery rack up and quickly surpass those of Panoply.

Now that we've looked at the monthly cost of each tool, let's see how easy they are to use.

Setup and maintenance comparison

You need to consider several aspects of setup and maintenance for your ETL tools and data warehouses.

Initial setup
Ongoing maintenance
Staff
Adding custom connectors

First, let's look at how Panoply performs in these aspects.

Setup and maintenance with Panoply

Initial setup

The initial setup involves connecting data sources to your data warehouse, creating logins, and checking that everything is running smoothly.

With Panoply, you can do this in a matter of minutes.

It's just clicking and entering information on a few screens, and you're done. Very simple and very fast.

Ongoing maintenance

Maintenance includes the time and energy spent ensuring your connectors are piping the correct data into your warehouse. Among other things, it also includes making sure your warehouse is optimized and running queries efficiently.

We specifically designed Panoply to provide automated maintenance on Redshift and BigQuery.

Therefore, there isn't much you need to do. Your connectors will keep working as you set them up (even after API changes). Your warehouse will keep your data stored efficiently, and you'll easily be able to access it whenever you want; Panoply just works!

Staff

You don't need to hire anyone to make Panoply work.

Your data analysts can set up everything themselves, and you don't need a data engineer to manage your warehouse or any part of your ETL pipeline.

Panoply handles it all for you.

As you scale, you may want to hire a Data Engineer to optimize queries and get more bang for your buck each month. But you'll want to do this regardless of the data warehouse you choose and probably at a more mature stage in your company's development.

Adding custom connectors

No Snap Connector? No problem! Panoply offers dozens of native Snap Connectors to popular data sources, and for any API data source without a Snap Connector, we have the Flex Connector. This one requires a bit more set up than the point-and-click- Snap Connectors, but both connector types are included in your subscription and therefore have no effect on monthly cost.

Now let's look at how self-managed solutions perform in the aspects outlined above.

Set up & maintenance of a self-managed solution

Initial setup

Stitch's marketing says you can get set up in minutes. But when you look at their documentation, this seems like an unlikely timeframe.

They say the setup for some destinations requires technical expertise, such as familiarity with running SQL commands or using the command line. So, it will probably take you a few hours. It shouldn't be a problem for your Data Analysts, but it certainly won't take a few minutes.

It's a similar story for setting up your data warehouse in Redshift or BigQuery.

It involves many steps, some that are technical and some less so. Again, a Data Analyst should be able to get set up within a few hours.

Ongoing maintenance

Stitch and Fivetran were designed with minimal maintenance in mind. Just like with Panoply, your data should keep coming into your data warehouse as intended.

BigQuery was also designed to be simple. You pipe your data in and can get analyzing right away.

However, your costs can spiral out of control if you don't optimize your queries (we'll talk about this more in the next section).

Maintaining a Redshift data warehouse can be a full-time job. It is the F1 car of the data warehouse world. Yes, you can go extremely fast and perfectly optimize it for you, but this requires time, effort, and extensive expertise.

If you use Redshift, you will have a never-ending list of things to monitor and maintain; you have been warned.

Staff

As mentioned above, Stitch and Fivetran require minimal maintenance. So, whether or not you need to hire anyone depends on your data warehouse.

BigQuery is simple to use, and your Data Analysts should be able to maintain it in the short-medium term. As you run more and more queries in the future, you will probably need to hire someone to optimize them.

BigQuery charges you per TiB, so reducing the number of MBs each query uses leads to savings.

For Redshift, you need to hire a Data Engineer, perhaps even from day one, to tweak all the options and ensure your resources are being used optimally. On the other hand, you could run DC2 nodes all month (which have one cost for storage and compute) and survive for the short term knowing that your monthly costs are 730 hours x hourly rate (or cheaper if you use bulk pricing).

The above is not an optimal setup and will be more expensive than Panoply, but it will make your maintenance more manageable.

There is one thing to note when comparing Redshift and other warehouses: To get the lowest costs with Redshift, you need to hire a Data Engineer (or 2 or 3), which costs $92,519/year on average in the US or $7,710/month!

And if you don't hire a Data Engineer, you will need to devote some number of hours each week to maintain your warehouse.

You don't have to do this with Panoply or BigQuery.

Adding custom connectors

You can set up custom connectors to any data warehouse using Stitch or Fivetran.

If you're already using these tools, you can argue there will be less of a learning curve to get them set up. But since they boast such an easy setup in their marketing, this doesn't seem like a significant advantage.

You'll need to learn a tool such as Singer regardless.

As discussed in the Panoply section, it can be cheaper to use a combination of Panoply and Stitch rather than Stitch alone since the jumps between Stitch brackets are not even.

On the other hand, if you are ingesting 75 million rows/month with Stitch, you can consume up to 25 million rows/month on your current plan with your custom connector, and it won't cost you anything. But to ingest 25 million rows with Stitch if you don't already have an account will cost you $350/month.

The winner here depends on your situation.

Who wins the "setup and maintenance" battle?

Initial setup

The initial setup is faster with Panoply.

The difference is probably a few minutes with Panoply and several hours (perhaps a day) without. A bit of an inconvenience but nothing crazy long term.

Ongoing maintenance

Panoply is the clear winner when compared to Redshift and has an edge when compared to Stitch and BigQuery.

You only have to manage one system, and Panoply automates many of your daily admin tasks. This automation means less maintenance work overall and gives you more time to spend on revenue-generating activities.

Staff

A Data Analyst should be able to get themselves set up and keep the system maintained for a long time with Panoply or with Stitch + BigQuery.

If you use Redshift, you will need to hire someone to focus on the maintenance and ensure all the dials get set correctly.

We think Panoply clearly beats Redshift here and is tied for first place with Stitch and BigQuery.

Conclusion

So there you have it—a breakdown of Panoply vs. self-managed solutions. Phew - that was a lot of detail comparing ETL and DW providers.

We've seen how Panoply is the cheaper tool in almost all situations (apart from tiny initial setups). It also remains the most affordable solution by at least $100+/month as you scale your data and increase the number of query bytes you perform each month.

Likewise, we've demonstrated that the ease of setup and maintenance when using Panoply makes it the clear winner in those categories as well.

There’s just one area where Panoply does lag slightly behind. If you are a large organization that already has a data warehouse and a team of data engineers ready to optimize and tinker with queries, you would probably benefit most from using Redshift.

However, suppose you want all the benefits of Redshift without the headaches of optimization. In that case, you can use the Panoply Flex Connector and pipe the data into your Panoply data warehouse to get all the setup and maintenance benefits we spoke about above.

To know which is best, you need to try them out and, luckily, all the companies we mentioned have a free trial.

Even with this breakdown, we know pricing comparisons can be complicated. Panoply makes it simple - schedule a demo to chat with one of our experts to see how.

We hope you can see why we love Panoply, and why we’re certain it will reduce your costs and make your daily work life much easier. Our goal is to automate away all the unnecessary stuff so that you can focus on what makes your business grow: analyzing your data!

Now you should have everything you need to decide if Panoply is for you. To experience it first hand, try Panoply for free for 21 days.

View full post