With companies of all sizes looking to gain value from their data, cloud data warehouses are all the rage. They’re a critical component of any modern data stack, as they enable companies of all sizes that need to wrangle their data across all their various departments.
With extensive offerings from Amazon, Google, Microsoft, Snowflake and others, developers can spin up reliable, cheap infrastructure components in minutes or hours. As a result, new projects can be prototyped, built out, and launched with a degree of speed that would have been unthinkable a decade ago.
This trend has been just as strong in the data management space: for data-driven companies, it’s gotten significantly easier to integrate all your data in one place. Cloud-based data warehouse solutions are available for data operations of almost every size and level of complexity—though some are a bit easier to work with than others.
Since we’re a data platform, we’ve spent a lot of time researching the merits of different data warehouse offerings, and we thought we’d share those findings with you. Today, we’re going to be discussing the key differences—primarily in terms of cost and performance—between Snowflake’s data warehouse and Google’s BigQuery.
Cost: Snowflake versus BigQuery
Since we’re talking about cloud data warehouses for business use here, we might as well cut right to the chase. How do Snowflake and BigQuery compare on price? Get your calculators out, this one's a doozy.
Snowflake’s pricing model
Snowflake bills per hour for each virtual warehouse, so pricing depends heavily on your usage pattern. Also, data storage and computation are billed separately, so storage costs need to be factored in after calculating usage costs. Using the US as a reference, Snowflake storage costs begin at a flat rate of $23/TB, average compressed amount, per month (accrued daily).
Meanwhile, compute costs $0.00056 per second, per credit, for their Snowflake On Demand Standard Edition. If that wasn’t already confusing, Snowflake offers seven different tiers of computational warehouses. The smallest cluster, X-Small, costs one credit per hour, or $2/hour. But as you move up the tiers in complexity, the per-hour cost in credits doubles, so costs can pile up pretty quickly.
Fortunately, Snowflake’s dynamic pricing model can be of some help here. With dynamic cluster management, clusters stop when no queries are running and automatically resume when new queries are initiated, sizing themselves up and down based on workload. As a result, you’ll pay less for Snowflake service as your query load decreases.
BigQuery’s pricing model
BigQuery costs $0.02/GB, but that only covers storage, not queries. You’ll also pay per query based on the amount of data processed at a rate of $5/TB.
BigQuery doesn’t use indexes and instead relies on clustering to make its queries more efficient. That can make it difficult to accurately estimate how much a query would cost based on the size and shape of your data.
Let’s say you have 1TB of data spread evenly across 50 columns (in several tables). A query that scans through 5 of these columns could end up processing 100GB at a cost of $0.50. This means that, per GB, you’ll pay an additional $0.005 per query. If you have 12 such queries per month it could actually cost you $0.08 (0.02 + 0.005 * 12). Like we said, though, this may not be how the actual costs come out for the end user, given the performance optimization Google has done on the backend. Of course, ultimately, since you’re paying Google per-query, the pricing will end up being pretty transparent—you’ll see exactly what each query cost you after the fact.
If the idea of per-GB / per-query pricing makes you nervous, don’t worry: BigQuery has a flat-rate pricing plan for those who crave stability and predictability in their pricing. For a flat monthly rate of $10,000 (or $8,500/month if billed annually), BigQuery users receive 500 slots that can be used for a number of different query types.
Actual costs: Snowflake versus BigQuery
When it comes down to it, the price you pay for a data warehouse on either of these platforms depends heavily on the size of your data and workload, so it’s difficult to quote a price that every user can expect to pay.
If we consider a standardized data warehouse setup (like, say, one configured for benchmarking purposes), we can start to get a sense of what a standard setup would cost. With a 1TB data warehouse built using the TPC-DS dataset, Fivetran demonstrated that Snowflake was slightly cheaper than BigQuery, with a (geometric) mean price of $0.265/query for 99 complex queries, compared to $0.305/query for the same 99 queries on a BigQuery setup.
Your mileage will almost certainly vary, however, especially if you’re planning on buying a flat-rate pricing plan from BigQuery.
Performance: BigQuery versus Snowflake
While the pricing model is critical, would-be data warehouse buyers should definitely take performance into account as well. That’s where benchmarking comes in.
In 2020, Fivetran created a benchmarking report that includes both Snowflake and BigQuery. Here’s how they set it up:
- They generated a 1TB TPC data set with 24 tables in a snowflake schema; the largest fact table had 4 billion rows
- They ran 99 TPC-DS queries of varying complexity, including joins, aggregations, and subqueries
- They ran each query just once to prevent the warehouses from caching previous results.
For the configuration they used Snowflake Large $16.00/per hour and BigQuery Flat-Rate with 600 slots that cost $16.44/per hour. One major decision that Fivetran made was to not fine tune the data warehouses by using any form of distribution key.
So what were the results?
Speed: Snowflake is faster than BigQuery
Below you can see how BigQuery and Snowflake stacked up:
Avg. query time (seconds)
In a head-to-head test, Snowflake edged out BigQuery in terms of raw speed, with queries taking an average of 8.21 seconds (geometric mean). Meanwhile, BigQuery clocked in at an average of 11.18 seconds per query. In other words, Snowflake was faster in Fivetran’s tests.
If you’ve read other articles comparing Snowflake and BigQuery performance, you might have seen somewhat different results. That’s partly due to the timing of the studies (a lot has changed at BigQuery over the past 18 months), but another major factor is the different methodologies used in those benchmark tests, which can impact the outcome.
That said, in 2019 Gigaom and Microsoft teamed up to do a variation of these tests using the same dataset and found that Snowflake just seems to perform faster than BigQuery on tasks using the TPC-DS dataset.
Query language: Snowflake versus BigQuery
BigQuery used to rely on what is now referenced as Legacy SQL. This was a unique SQL syntax that had lots of nuance, requiring analysts and engineers to adjust their SQL knowledge.
However, with the release of BigQuery 2.0, developers were also provided the ability to use Standard SQL. This SQL dialect has a much more familiar feel for SQL users and doesn’t require any upskilling to understand.
In general, Snowflake allows users to use standard SQL to interact with their underlying data. In particular, Snowflake should be compatible with ANSI SQL. There are several nuances that exist in Snowflakes SQL, mostly because Snowflake provides a lot more features and functionality, which creates the need for non-standard commands.
Nested data: Snowflake versus BigQuery
An important ability of modern cloud data warehouses is the ability to manage nested data like JSON or XML. Having support for these types of data simplifies data engineering by enabling you to pull in said data and manage the specifics later.
BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON, Avro, Firestore export files, and Datastore export files.
Snowflake allows users to upload semi-structured data types like JSON, Avro, and XML, generally storing this data in VARIANT, OBJECT, or ARRAY columns.
Support: BigQuery versus Snowflake
When it comes to support, both data warehouse providers offer varying levels based on your payment tier.
For example, in Snowflake’s standard tier, you can get support during weekday business hours. However, if you pay a little more per compute hour, you can get 24/7/365 support.
Similarly, BigQuery charges in tiers. But instead of attaching support to your compute hours, they do it with a flat rate. For example, the same level of Snowflake’s standard tier is $100 per user. If you want 24/7/365 support, you’ll need to spend $250 per user.
Other things to think about
Along with these common considerations, there are a number of other factors that might play into your data warehouse selection. Here are a few to keep in mind:
- Usability: Both Snowflake and BigQuery are somewhat user-friendly. There isn’t a lot that differentiates them here, but BigQuery’s serverless architecture means you won’t have to do any setup or initial configuration aside from moving your data into Google Cloud storage.
- Management and Maintenance: Both Snowflake and BigQuery are low-maintenance offerings, with automated management happening in the background.
- Snowflake: Queries are tuned and optimized while you work, and the size and power of your instance is automatically rescaled to fit changing needs.
- BigQuery: Since the platform is designed to be serverless, users will barely even notice optimizations, since everything happens far in the background.
- Scaling: Snowflake makes it easy and fast to scale your instances to deal with workload by combining automatic performance tuning and workload monitoring. With BigQuery, users don’t need to worry about scaling at all—everything is handled under the hood and the system is optimized to keep performance relatively constant as complexity increases without growing costs significantly.
Which data warehouse is right for you?
Ultimately, in the world of cloud-based data warehouses, Snowflake and BigQuery are more alike than different. Performance is pretty similar for most tasks, hands-on maintenance is low, and per-query costs aren’t all that different.
The main difference you will likely want to consider is the way that the two services are billed, especially in terms of how this billing style will work out with your style of workflow.
- If you have very large data, but a spiky workload (i.e. you run lots of queries occasionally, with high idle time), BigQuery will probably be cheaper and easier for you.
- If you have a steadier usage pattern when it comes to queries and data, it may be more cost effective to go with Snowflake, since you’ll be able to cram more queries into your compute hours.
Of course, if the idea of a low-maintenance, zero-configuration, easy-to-use data warehouse appeals to you, you should consider Panoply. Our managed data warehouse offers all the benefits of a traditional data warehouse, plus zero configuration (aka, setup that takes minutes, not weeks) and minimal ongoing maintenance. To learn more about how Panoply works, schedule a personalized demo today.