The past few years have been pretty exciting for cloud computing. With extensive offerings from Amazon, Google, Microsoft, Snowflake and others, developers can now spin up reliable, cheap infrastructure components in minutes or hours. As a result, new projects can be prototyped, built out and launched with a degree of speed that would have been unthinkable a decade ago.
This trend has been just as strong in the data management space: for data-driven companies, it’s gotten significantly easier to integrate all your data in one place. Cloud-based data warehouse solutions are available for data operations of almost every size and level of complexity--though some are a bit easier to work with than others. Since we’re a data warehouse company, we’ve spent a lot of time thinking about and researching the merits of different data warehouse offerings, and we thought we’d share some of our findings with you here. Today, we’re going to be discussing the key differences--primarily in terms of cost and performance--between Snowflake’s data warehouse and Google’s BigQuery.
Cost: Snowflake vs BigQuery
Since we’re talking about cloud data warehouses for business use here, we might as well cut right to the chase. How do Snowflake and BigQuery compare on price? Get your calculators out.
Snowflake bills per hour for each virtual warehouse, so pricing depends heavily on your usage pattern. Also, data storage and computation are billed separately, so storage costs need to be factored in after calculating usage costs. As examples, using the US as a reference, Snowflake storage costs can begin at a flat rate of $23/TB, average compressed amount, per month (accrued daily). Meanwhile, compute costs $0.00056 per second, per credit, for their Snowflake On Demand Standard Edition. If that wasn’t already confusing, Snowflake offers seven different tiers of computational warehouses.. The smallest cluster, X-Small, costs one credit per hour, or $2/hour. But as you move up the tiers in complexity, the per-hour cost in credits doubles, which can allow costs to pileup pretty quickly.
Fortunately, Snowflake’s dynamic pricing model can be of some help here. With dynamic cluster management, clusters will stop when no queries are running and automatically resume when new queries are initiated, sizing themselves up and down based on workload. As a result, you can calculate that you’ll be paying less for Snowflake service when your query load decreases.
When we last looked at BigQuery pricing, Google hadn’t added some of the pricing tiers that they now offer, but our findings last time around pretty much hold: BigQuery’s cost of $0.02/GB only covers storage, not queries. You pay separately per query based on the amount of data processed at a $5/TB rate. BigQuery doesn’t use indexes, and instead relies on clustering to make its queries more efficient--but that can make it difficult to make an accurate estimate of how much a query would cost based on the size and shape of your data. But let’s say you have 1TB spread evenly across 50 columns (in several tables). A query that scans through 5 of these columns could end up processing 100GB at a cost of $0.5. This means that, per GB, you’ll pay an additional $0.005 per query. If you have 12 such queries per month it could actually cost you $0.08 (0.02 + 0.005 * 12). Like we said, though, this may not be how the actual costs come out for the end user, given the performance optimization Google has done on the backend. Of course, ultimately, since you’re paying Google per-query, the pricing will end up being pretty transparent--you’ll see exactly what each query cost you after the fact.
If the idea of per-GB / per-query pricing makes you nervous, don’t worry: BigQuery has also added a flat-rate pricing plan for those who crave stability and predictability in their pricing. For a flat monthly rate of $10,000 (or $8,500/month if billed annually), BigQuery users will receive 500 slots that can be used for a number of different query types. Make sure to check out the next section for some more detailed explorations of the actual costs of using Snowflake and BigQuery that are based on benchmarking tests.
Snowflake vs BigQuery: Actual costs
When it comes down to it, the price you pay for your own data warehouse on either of these platforms is going to depend heavily on the size of your data and workload, so it’s difficult to quote a price that every user can expect to pay. If we consider a standardized data warehouse setup (like, say, one configured for benchmarking purposes, for example), we can start to get a sense of what a standard setup would cost. With a 1 TB data warehouse built using the TPC-DS dataset, one group of benchmarkers demonstrated that Snowflake was slightly cheaper than BigQuery, with a (geometric) mean price of $0.265/query for 99 complex queries, vs. $0.305/query for the same 99 queries on a BigQuery setup. Your mileage will almost certainly vary, however, especially if you’re planning on buying a flat-rate pricing plan from BigQuery.
The other thing to keep in mind when comparing these two services on pricing is the fact that they’re billed somewhat differently. Because BigQuery is billed per query, you really do only pay for what you use. You don’t pay for idle time on BigQuery the way that you would with Snowflake. This means that, even though Snowflake is cheaper by the query on average, if your workflow doesn’t include a lot of continuous use of your data warehouse, you might find that a BigQuery-based setup is actually cheaper.
Performance: Snowflake vs BigQuery
Now that we’ve covered pricing, the next obvious question is one of performance--how does Snowflake compare to BigQuery on performance? The most recent performance data that directly compares the two is from the same group we referenced above, who did their benchmarking in September 2018 and compared Snowflake and BigQuery in terms of how fast each respective platform could execute 99 standardized TPC-DS queries.
In their tests, the group set up 1 TB data warehouse equivalents on each platform. For Snowflake, this meant using a Large-configuration data warehouse with 8 servers per cluster at $16/hour. For BigQuery, since everything is on-demand and charged per query, there wasn’t a specific configuration setup step.
Speed: Snowflake is faster than BigQuery
In a head-to-head test, Snowflake edged out BigQuery in terms of raw speed, with queries taking, on average, 10.74 seconds (geometric mean). Meanwhile, BigQuery clocked in at 14.32 seconds per query, on average. In other words, Snowflake was faster in these tests.
If you’ve been reading other articles comparing Snowflake and BigQuery’s performance, you might have seen somewhat different results. This is partly due to the different methodologies used in those benchmark tests, and partly due to the fact that these results are the most recent benchmark data we have available--a lot has changed over at BigQuery over the past 18 months. But other groups have done variations of these tests using the same dataset and shown that Snowflake just seems to perform faster than BigQuery on tasks using the TPC-DS dataset.
Of course, cost and performance are two important factors to consider when it comes to setting up your data warehouse solution, but there are a number of other things that might play a part in your ultimate decision. We’re going to look at some of these now.
- Usability: Of the cloud-based data warehouse options out there, both Snowflake and BigQuery are pretty far toward the user-friendly end of the spectrum. There isn’t a lot that differentiates them here, but BigQuery’s serverless architecture means you won’t have to do any setup or initial configuration aside from moving your data into Google Cloud storage.
- Management and Maintenance: Both Snowflake and BigQuery are low-maintenance offerings, with automated management going on in the background. In Snowflake’s case, this means that queries are tuned and optimized in the background while you work, and the size and power of your instance is automatically rescaled to deal with changing needs. In BigQuery’s case, since the platform is designed to be serverless, your users will barely even be aware of these considerations, since everything will be happening far in the background.
- Scaling: As mentioned above, Snowflake makes it easy and fast to scale your instances to deal with workload by combining automatic performance tuning and workload monitoring. With BigQuery, like Panoply, users won’t even need to think about scaling at all--everything is handled under the hood. If you want to build a petabyte-scale data warehouse, your main pain point will be moving your data into Google Cloud. After that, all you have to do is run your query, and BigQuery will handle the rest. One other thing to consider about BigQuery's scaling ability: the system is optimized to keep performance relatively constant as complexity increases without growing costs significantly.
Which data warehouse is right for you?
Ultimately, in the world of cloud-based data warehouses, Snowflake and BigQuery are more alike than different. Performance is pretty similar for most tasks, user maintenance burden is low, and per-query costs aren’t all that different for both of them. The main difference you will likely want to consider is the way that the two services are billed, especially in terms of how this billing style will work out with your style of workflow. If you have very large data, but a spiky workload (i.e. you’re running lots of queries occasionally, with high idle time), BigQuery will probably be the cheaper and easier for you. If you have a steadier, more continuous usage pattern when it comes to queries and the data you’re working with, it may be more cost effective to go with Snowflake, since you’ll be able to cram more queries into the hours you’re paying for.
Of course, if the idea of an automated, easy-to-use data warehouse appeals to you, you should also consider Panoply. Our Redshift-based managed data warehouse combines all the advantages of Redshift’s power, tunability and familiar architecture with a no-maintenance automation layer--all for a flat monthly fee. If you’re interested in seeing how Panoply works, start a free trial today.