Build or Buy? Panoply vs. Self-Managed Solution (By the Numbers)

The goal of this article is to present an unbiased view and play-by-play breakdown of how much it would cost to build your own ETL pipeline and data warehouse vs. using Panoply.  

This post is pretty long and detailed —and technical. But this is what is needed to make an apples-to-apples comparison between tools.

We discovered when writing this article that it's challenging to compare data warehouse and ETL solution providers. They all have different pricing structures and a bazillion options. So, it takes a lot of time and effort to compare them.

Moreover, your costs today won't be your costs tomorrow.

You may choose a cheap Redshift node costing $0.33/hour to start and quickly realize they are too slow. So, you upgrade to a more powerful node for $1.20/hour; that's an instant 4x increase in your costs!

To account for this, we've given you a range of prices for each service to show how they may evolve over time.

Since all the providers have different pricing structures, we needed one to refer back to for comparison. Unsurprisingly, we chose Panoply for this - maybe we are a little bit biased ;).

We've linked to all the pages we've used to get our data, so please feel free to check them to ensure our numbers are correct.

Lastly, we've stuck with the most prominent players in the ETL and DW space: Stitch and Fivetran for the ETL and Redshift and BigQuery for the DW. We know there are other providers, but this article is long enough already.

We've also chosen monthly (rather than annual) pricing for all providers to make comparisons easier.

Who is this comparison for?

The information in this article will be most relevant if you are part of an early-stage start-up (20-50 people) and looking to set up your first data warehouse.

Larger businesses probably already have data warehouses, and the question is then one of migration.

Pros and cons of building vs. buying

The idea of building a custom ETL pipeline and data warehouse vs. buying it off-the-shelf has changed in recent years.

Before (pre-2015), building meant you would hire a team of Data Engineers to write ETL code that pipes data into your (perhaps on-premises) data warehouse.

Nowadays, no one should be writing ETL pipelines. They will break, and you will waste hundreds of hours of engineering time.

Companies like Stitch and Fivetran have spent millions of dollars creating production-grade pipelines for you, and you can use them for a few hundred dollars per month.

Your team of 1-3 Data Engineers cannot possibly compete.

Today, the question is: which paid providers are you going to choose? In this article, we will compare the top ETL and DW tools on the market to Panoply.

Outline of a modern data stack (ETL + DW + BI) and why each piece is necessary

The modern data stack consists of 3 layers.

  • The first layer extracts all your data from disparate sources and loads it into one place, i.e., your ETL or data ingestion tool.
  • The second layer stores the data in one location as a single source of truth, i.e., your data warehouse (DW).
  • The final layer analyzes the data and enables you to extract insights, i.e., your BI tool.

There are many tools on the market for these 3 layers. You may even have more than one tool for each layer, e.g., you could use 2 BI tools, one for product analytics (e.g., Amplitude) and another for ad hoc analysis (e.g., Mode).

We'll assume you need one tool per layer for simplicity, and for brevity, we will not compare BI tools.

We will only look at the difference between buying an ETL tool and a data warehouse or combining them with Panoply.

First, we'll look at the price; then, we'll compare their setup and maintenance costs.

Let's start with Panoply.

Pricing comparison

Cost of Panoply

Panoply combines an ETL tool with a data warehouse.

  • You get unlimited data connections, storage, and compute power.
  • You can connect all your data sources to your data warehouse using one of the 300+ data connectors, store it instantly and start querying and analyzing right away.
  • You can set up everything within a matter of minutes and a few mouse clicks.

Panoply offers 3 pricing tiers, and each tier gives you 2 things:

  1. The number of rows your data warehouse can ingest each month (called rows scanned).
  2. The number of TBs of queries you can run each month (called query bytes).

All plans come with 1 TB of storage and unlimited automated scaling as your data needs grow.

Here is the pricing breakdown we will refer to throughout the article:

Panoply pricing tiers

Pricing Tier

Rows Scanned

Query Bytes

Storage

Price per month

Starter

10 million

40 TB

1 TB

$499

Pro

100 million

60 TB

1 TB

$799

Expert

200 million

80 TB

1 TB

$1,199

 

We will see shortly that, for the Pro and Expert levels, Panoply is priced similarly (or cheaper!) than the ETL tools without even taking into account data warehouse costs.

Now let's look at the cost of self-managed ETL tools.

Cost of self-managed ETL tools

The 2 primary self-managed ETL tools are Stitch and Fivetran.

Both of these tools:

  • say you can get set up in minutes,
  • have 130+ integrations with all the most common data sources (and are adding more each day),
  • move data into your data warehouse, guarantee uptime, ensure your pipelines work and automatically update them after API and schema updates.

However, their pricing models are very different.

  • Stitch charges you based on the number of rows you ingest each month (Panoply works in the same way).
  • Fivetran charges you based on the number of unique rows you update each month.  

With Stitch and Panoply, updating one row 5 times counts as 5 rows ingested. With Fivetran, it counts as one updated row. Thus, if you update tables tens or hundreds of times each month, using Fivetran can be more cost-effective. Using Stitch or Panoply for tables like that would quickly drain your monthly limit.

However, Fivetran is an order of magnitude more expensive than Panoply or Stitch for small volumes of data.

If you have 1 million unique row updates each month, Fivetran costs $1,000/month, $1,500/month, or $2,000/month, depending on their price plan, which is more than Panoply's most expensive plan ($1199) that gives you 200 million rows/month!

For this reason, we will only compare Stitch and Panoply from now on.

Note: please see Fivetran's service consumption table to see how we calculated the monthly costs.

Let's compare Stitch's monthly pricing with Panoply's.

Stitch and Panoply pricing comparison

Rows Ingested per Month

Stitch

Panoply

Winner

10 million

$180

$499

Stitch

$319 cheaper

100 million

$750

$799

Stitch

$49 cheaper

200 million

$1000

$1199

Stitch

$199 cheaper

 

Unsurprisingly, Stitch is cheaper than Panoply at every pricing tier as it only does half of what Panoply does.

But the price difference is not much. For 100 million rows per month, Panoply is just $49/month more expensive. Once you factor in the data warehouse costs, you'll see that Panoply comes out much cheaper.

Indeed, you're going to have a hard time finding a data warehouse that costs less than $49/month!

Conclusion: Cost of Panoply vs. cost of self-managed ETL tools

  • If you use Fivetran for anything, you will pay way more than Panoply. So Panoply wins here, hands down.
  • If you have a small amount of data (< 10 million rows), Stitch is the clear winner as it is $319/month cheaper than Panoply.

However, once your data starts to scale, the difference between the 2 options decreases dramatically.

Panoply is only slightly ($49) more expensive for the middle option and costs $199 more on the most expensive tier. But if you are ingesting 200 million rows per month, you will definitely pay more than $199/month for your data warehouse, as we will soon see, meaning that Stitch + a DW would be more expensive than Panoply.

One thing to note is that Stitch gives you more control over how many rows you can ingest each month. For example, you can pay $350/month to consume 25 million rows with Stitch, but Panoply only gives you the option of 10 million or 100 million. In such cases, Stitch can work out to be cheaper.

In an apples-to-apples comparison, Panoply is much cheaper than Fivetran and slightly more expensive than Stitch and achieves the same results as both.

Now let's see how much data warehouses cost.

Since most of you will want to ingest more than 10 million rows per month, the data warehouses need to cost less than $200/month to make financial sense compared to Panoply.

Cost of self-managed data warehouses

Redshift

Amazon says Redshift costs less to operate than any other data warehouse. This claim is, in part, because you can tinker with every aspect of your setup. However, we will only look at a simple one.

To use Redshift, you need to rent nodes, i.e., buckets of storage and compute power.

  • If you have less than 1 TB of data, Amazon recommends you use DC2 nodes, for which you pay by the hour.
  • If you have more than 1 TB of data, Amazon recommends you use RA3 nodes. For these, you pay separately for storage and compute. You pay for storage by the TB and compute by the hour.

If you have less than 160 GB of data, you can survive with one DC2 node and choose the cheapest one, which costs $0.33/hour and give you 2 x CPUs and 15 GB of memory (not much, we know).

Running this setup for a full month costs 730 hours x $0.33 = $240.90/month.

Using this with Stitch’s middle tier costs $750 + $240.90 = $990.90/month and is $191.90/month more expensive than Panoply’s middling tier at $799/month.  

As you scale, this setup becomes more expensive.

If you have 300 GB of data, you will need 2 DC2 nodes. Now your costs have doubled. You pay 730 hours x $0.33 x 2 nodes = $481.80/month. This number is only going to keep increasing as your data increases. Now Redshift and Stich costs $750 + 481.80 = $1231.80 and are $432.8/month more expensive than Panoply.

When you put it this way, you can see that Panoply is less expensive than Redshift's DC2 nodes.

If you use RA3 nodes, you pay $27.10 per TB per month for storage. Compute nodes cost at least $3.0606/hour; for that, you get 12 CPUs and 96 GB of memory (much better!).

Now, let's calculate how many hours of compute you can use if you want to keep your data warehouse costs below $200/month.

First, we subtract storage costs: $200 - $27.10 = $172.90. So you have $172.90 / $3.0606 = 56.5 compute hours per month. That is 14 compute hours per week or 3 hours per workday.

If your company can get away with just 3 hours per day, every day, every month, then RA3 nodes are cheaper than Panoply. If not (looking at you overnight batch jobs!), then Panoply also makes more sense financially than Redshift RA3 nodes.

Note that we are not including additional services Amazon offers, such as Spectrum ($200/month for 40 TB of queries) or automated backups.

Panoply is the clear winner here.

For even more info on this cost analysis, check out our in-depth breakdown of Redshift pricing.

Now, let's look at one of the other prominent players in the Data Warehouse as a Service (DWaaS) industry: Google's BigQuery.

BigQuery

BigQuery pricing is split into 2 parts: storage and compute.

  • Storage costs $20 per TB.
  • Compute costs $5 per TB of queries.

This pricing model makes it easy to compare with Panoply. We will also add in Stitch costs for a complete comparison. 

Storage and Compute

BigQuery + Stitch

Panoply

Winner

1 TB storage
40 TB compute
10 million rows

$20 + (40 x $5) +

$180

= $400

$499

BigQuery + Stitch

$99 cheaper

1 TB storage

60 TB compute

100 million rows

$20 + (60 x $5) +

$750

= $1070

$799

Panoply

$271 cheaper

1 TB storage

80 TB compute

200 million rows

$20 + (80 x $5) +

$1000

= $1420

$1199

Panoply

$199 cheaper

 

You can see that for small volumes of data, BigQuery and Stitch work out cheaper than Panoply by just $99/month.

But once you scale and start to ingest more rows, Panoply is the clear winner.

Winner of pricing comparison: Self-managed ETL + DW vs. Panoply

It is possible to create setups with Redshift and BigQuery that are cheaper than Panoply.  

For Redshift, you need a small amount of data (less than 160 GB), and you have to use the cheapest and least powerful nodes. Once your data goes above 160 GBs, Redshift doubles in price and becomes more expensive than Panoply.

If you combine BigQuery and Stitch and ingest a small amount of data (less than 10 million rows) per month, it's slightly cheaper than Panoply by $99/month.

We recommend choosing BigQuery over Redshift in this case because BigQuery doesn't limit your processing power, and you get a full-scale warehouse behind you on day one.

In all other cases, Panoply works out cheaper.

Once you start ingesting more rows per month or performing more TBs of queries (as you inevitably will), the costs of Stitch and BigQuery rack up and quickly surpass those of Panoply.

Now that we've looked at the monthly cost of each tool, let's see how easy they are to use.

Setup and maintenance comparison

You need to consider several aspects of setup and maintenance for your ETL tools and data warehouses.

  • Initial setup
  • Ongoing maintenance
  • Staff
  • Adding custom connectors

First, let's look at how Panoply performs in these aspects.

Setup and maintenance with Panoply

Initial setup

The initial setup involves connecting data sources to your data warehouse, creating logins, and checking that everything is running smoothly.

With Panoply, you can do this in a matter of minutes.

It's just clicking and entering information on a few screens, and you're done. Very simple and very fast.

Ongoing maintenance

Maintenance includes the time and energy spent ensuring your connectors are piping the correct data into your warehouse. Among other things, it also includes making sure your warehouse is optimized and running queries efficiently.

We specifically designed Panoply to provide automated maintenance on Redshift and BigQuery.

Therefore, there isn't much you need to do. Your connectors will keep working as you set them up (even after API changes). Your warehouse will keep your data stored efficiently, and you'll easily be able to access it whenever you want; Panoply just works!

Staff

You don't need to hire anyone to make Panoply work.

Your data analysts can set up everything themselves, and you don't need a data engineer to manage your warehouse or any part of your ETL pipeline.

Panoply handles it all for you.

As you scale, you may want to hire a Data Engineer to optimize queries and get more bang for your buck each month. But you'll want to do this regardless of the data warehouse you choose and probably at a more mature stage in your company's development.

Adding custom connectors

This is one area where Panoply falters slightly. There is currently no way for you to add custom connectors to Panoply without using external tools.

This may sound like a problem, but you should be able to ingest 75-90% of your data using the pre-built connectors. Indeed, Tristan Handy, the co-creator of dbt, says this has been his experience.

However, you will probably want to add at least one custom connector.

How do you do that with Panoply?

We recommend you use Stitch or Fivetran to create custom connectors and pipe the data from them to Panoply.

You could use an open-source option like Singer, but you will have to manage the infrastructure yourself, which is a time sink and error-prone.

If you use Singer with Stitch, Stitch will treat your custom connector like one of its own. It will have 99.9% uptime, automatic updates after API changes, and you won't have to manage the infrastructure!

Depending on the complexity of the integration, you may need to hire a Data Engineer. Handy recommends doing so once you have to build 3+ custom connectors over the next few quarters.

You will probably get far without having to build any custom ETL pipelines. If you need lots of custom ETL from day 1, it will be expensive and time-consuming whatever ETL tool you choose.

However, using Panoply for most of your ETL pipeline and Stitch for the custom connectors can still work out cheaper.

For example, it costs $750/month to ingest 99 million rows/month with Stitch. If your custom connector consumes 5 million rows/month, you now have to pay $1,000/month for 99 + 5 = 104 million rows/month with Stitch - a $250/month increase.

On the other hand, if you are using Panoply to ingest 99 million rows/month and ingest 5 million rows/month with Stitch, that costs you just $100/month extra (a cost-saving of $150/month).

Panoply shines in 3 of the 4 areas we examined.

Now let's look at how self-managed solutions perform in the aspects outlined above.

Set up & maintenance of a self-managed solution

Initial setup

Stitch's marketing says you can get set up in minutes. But when you look at their documentation, this seems like an unlikely timeframe.

They say the setup for some destinations requires technical expertise, such as familiarity with running SQL commands or using the command line. So, it will probably take you a few hours. It shouldn't be a problem for your Data Analysts, but it certainly won't take a few minutes.

It's a similar story for setting up your data warehouse in Redshift or BigQuery.

It involves many steps, some that are technical and some less so. Again, a Data Analyst should be able to get set up within a few hours.

Ongoing maintenance

Stitch and Fivetran were designed with minimal maintenance in mind. Just like with Panoply, your data should keep coming into your data warehouse as intended.

BigQuery was also designed to be simple. You pipe your data in and can get analyzing right away.

However, your costs can spiral out of control if you don't optimize your queries (we'll talk about this more in the next section).  

Maintaining a Redshift data warehouse can be a full-time job. It is the F1 car of the data warehouse world. Yes, you can go extremely fast and perfectly optimize it for you, but this requires time, effort, and extensive expertise.

If you use Redshift, you will have a never-ending list of things to monitor and maintain; you have been warned.

Staff

As mentioned above, Stitch and Fivetran require minimal maintenance. So, whether or not you need to hire anyone depends on your data warehouse.

BigQuery is simple to use, and your Data Analysts should be able to maintain it in the short-medium term. As you run more and more queries in the future, you will probably need to hire someone to optimize them.

BigQuery charges you per TB, so reducing the number of MBs each query uses leads to savings.

For Redshift, you need to hire a Data Engineer, perhaps even from day 1, to tweak all the options and ensure your resources are being used optimally. On the other hand, you could run DC2 nodes all month (which have one cost for storage and compute) and survive for the short term knowing that your monthly costs are 730 hours x hourly rate (or cheaper if you use bulk pricing).

The above is not an optimal setup and will be more expensive than Panoply, but it will make your maintenance more manageable.

There is one thing to note when comparing Redshift and other warehouses: To get the lowest costs with Redshift, you need to hire a Data Engineer (or 2 or 3), which costs $92,519/year on average in the US or $7,710/month!

And if you don't hire a Data Engineer, you will need to devote some number of hours each week to maintain your warehouse.

You don't have to do this with Panoply or BigQuery.

Adding custom connectors

You can set up custom connectors to any data warehouse using Stitch or Fivetran.

If you're already using these tools, you can argue there will be less of a learning curve to get them set up. But since they boast such an easy setup in their marketing, this doesn't seem like a significant advantage.

You'll need to learn a tool such as Singer regardless.

As discussed in the Panoply section, it can be cheaper to use a combination of Panoply and Stitch rather than Stitch alone since the jumps between Stitch brackets are not even.

On the other hand, if you are ingesting 75 million rows/month with Stitch, you can consume up to 25 million rows/month on your current plan with your custom connector, and it won't cost you anything. But to ingest 25 million rows with Stitch if you don't already have an account will cost you $350/month.

The winner here depends on your situation.

Who wins the "setup and maintenance" battle?

Initial setup 

The initial setup is faster with Panoply.

The difference is probably a few minutes with Panoply and several hours (perhaps a day) without. A bit of an inconvenience but nothing crazy long term.

Ongoing maintenance

Panoply is the clear winner if compared to Redshift and has an edge when compared to Stitch and BigQuery.

You only have to manage one system, and Panoply automates many of your daily admin tasks. This automation means less maintenance work overall and gives you more time to spend on revenue-generating activities.

Staff 

A Data Analyst should be able to get themselves set up and keep the system maintained for a long time with Panoply or with Stitch + BigQuery.

If you use Redshift, you will need to hire someone to focus on the maintenance and ensure all the dials get set correctly.

We think Panoply clearly beats Redshift here and is tied for first place with Stitch and BigQuery.

Adding custom connectors

Which tool is the best here depends on how many rows/month you are currently ingesting.

  • If you are close to your monthly limit and adding a new connector will take you over that, using Panoply + Stitch is cheaper.
  • If you are well within your monthly ingestion limit and adding a custom connector will not take you over it, using Stitch will save you money.

On balance, we think Stitch and Fivetran are slightly better than Panoply in this case.

Conclusion

So there you have it—a breakdown of Panoply vs. Self-Managed solutions. We haven't seen anything close to this level of detail when comparing ETL and DW providers before.

We've seen how Panoply is the cheaper tool in almost all situations (apart from tiny initial setups). It also remains the most affordable tool by at least $180+/month as you scale your data and increase the number of query bytes you perform each month.

Likewise, we've demonstrated that the ease of setup and maintenance when using Panoply makes it the clear winner in those categories as well.

There are a couple of areas where Panoply does lag slightly behind.

If you are a big organization that already has a data warehouse and a team of Data Engineers ready to optimize queries and tinker with, you would probably benefit most from using Redshift.

Or, if the majority of your business's data comes from connectors that Panoply doesn't natively support, it is probably best to start with Stitch and Fivetran since you will need to use them anyway.

However, suppose you want all the benefits of Redshift without the headaches of optimization. In that case, you can use Stitch/Fivetran and pipe the data into your Panoply data warehouse to get all the setup and maintenance benefits we spoke about above. 

To know which is best, you need to try them out and, thankfully, all the companies we mentioned have a free trial.

We hope you can see why we love Panoply. We know it will improve your financial life and make your daily work much easier.

We want to automate away all the unnecessary stuff so that you can focus on the actions that make your business grow: analyzing your data!

Now you should have everything you need to decide if Panoply is for you. To experience it first hand, try Panoply for free for 14 days.

Get a free consultation with a data architect to see how to build a data warehouse in minutes.
Request Demo
Read more in:
Share this post:

Work smarter, better, and faster with monthly tips and how-tos.