Data has become the lifeblood of the business. To understand and utilize all this data, data warehouses have become an essential part of modern business. Today, we continue our discussion of modern data warehouses as we compare Redshift and Snowflake, and some core considerations when integrating a data warehouse.
Both are powerful relational DBMS database models, and both offer some really interesting options in terms of managing data. Redshift comes to us as a cloud-ready large scale data warehouse service for use with business intelligence tools. Similarly, Snowflake offers cloud-based data warehousing services for structured and semi-structured data.
To begin to differentiate the two, Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. One of the cool points here is that you can start with just a few hundred gigabytes of data and scale to a petabyte or more as your requirements grow.
Snowflake Computing sells a cloud-based data storage and analytics service called Snowflake Elastic Data Warehouse. With this solution, corporate users are able to store and analyze data using cloud-based hardware and software. From there, the data is stored in Amazon S3. Rather than relying on technologies like Hadoop, Snowflake actually leverages the public cloud ecosystem.
As mentioned, both of these solutions are powerful and offer some unique features when it comes to managing data. But, there are definitely differences. With that, let’s dive in.
Ecosystems and Integrations
If you’re working with an Amazon ecosystem, Redshift should be on your list. Redshift integrates with a variety of AWS services such as Kinesis Data Firehose, SageMaker, EMR, Glue, DynamoDB, Athena, Database Migration Service (DMS), Schema Conversion Tools (SCT), CloudWatch, etc.
On the other hand, you can absolutely find Snowflake on the AWS Marketplace with really cool on-demand functions. However, Snowflake does not have equivalent integrations which makes it more difficult for customers to use tools like Kinesis, Glue, Athena, etc. when trying to integrate their data warehouse with their data lake architecture. Snowflake does, however, offer a few other interesting integration points including IBM Cognos, Informatica, Power BI, Qlik, Apache Spark, Tableau and a few others.
Both options offer extensive integrations and have healthy ecosystem partners. With Redshift being more established you'll have a bit of a leg up, but Snowflake has come a long way.
If you're looking to simplify your data warehousing, Panoply offers a smart cloud data warehouse with over 100 pre-built data integrations. On top of that, Panoply automates ingestion and improves query performance with machine learning optimizations.
With that in mind, let’s look at how much it costs to run it all.
Price: Redshift vs Snowflake
At a very high level, we took a look at pricing models from both Redshift and Snowflake and found that Redshift is often less expensive than Snowflake for on-demand pricing. Additionally, with their 1-year and 3-year Reserved Instance (RI) pricing customers can get additional savings compared to standard on-demand rates.
That said, it’s important to note that major data warehouse players like BigQuery, Redshift, Snowflake, and Panoply each have rather different pricing models. Many of the data warehouses offer on-demand pricing and volume discounts. Redshift and Snowflake offer 30% to 70% discounts for prepaying.
Redshift charges per-hour per-node, which covers both computational power and data storage. With Redshift, you can calculate the monthly price by multiplying the price per hour by the size of the cluster and the number of hours in a month.
Redshift Monthly Price = [ Price Per Hour ] x [ Cluster Size ] x [ Hours per Month ]
Snowflake’s pricing bills at hour granularity for each virtual warehouse and depends heavily on your usage pattern. Since data storage is decoupled from the computational warehouses, it’s billed separately. As examples, using the US as a reference, Snowflake storage costs can begin at a flat rate of $23/TB, average compressed amount, per month accrued daily. Compute costs $0.00056 per second, per credit, for their Snowflake On Demand Standard Edition. On that note, this is where it can get a bit confusing. Snowflake offers seven different tiers of computational warehouses. The smallest cluster, X-Small, costs one credit per hour, or $2/hour. At each level, the number of credits per hour doubles. Snowflake offers a dynamic pricing model - clusters will stop when no queries are running and automatically resume when they are, and they can flexibly resize themselves based on a changing workload. This can potentially save you money when query load decreases.
On cost, when comparing Amazon Redshift’s 2, 4, and 8 node DC2.8XL clusters with equivalently sized Medium, Large and X-Large Snowflake configurations, Redshift is 1.3 times less expensive than Snowflake for on-demand pricing. When customers purchase a 1 or 3 year Reserved Instance (RI), Redshift is 1.9 times and 3.7 times less expensive than Snowflake.
With Panoply you have predictable transparent pricing based on storage, data sources, and support level desired. All plans include unlimited queries and access to live chat support.
Security: Redshift vs Snowflake
When it comes to data, security is a critical foundation. All this data that we’re creating from new sources open up new vulnerabilities to private and sensitive information. There is a significant gap between the amount of data being produced today that requires security and the amount of data that is actually being secured, and this gap will widen — a reality of our data-driven world.
As IDC points out, by 2025, almost 90% of all data created in the global datasphere will require some level of security, but less than half will be secured.
Both Redshift and Snowflake take security very seriously. Amazon Redshift database security is distinct from other types of Amazon Redshift security. In addition to database security, Amazon Redshift provides these features to manage security:
- Sign-in credentials — Access to your Amazon Redshift Management Console is controlled by your AWS account privileges.
- Access management — To control access to specific Amazon Redshift resources, you define AWS Identity and Access Management (IAM) accounts.
- Cluster security groups — To grant other users inbound access to an Amazon Redshift cluster, you define a cluster security group and associate it with a cluster.
- VPC — To protect access to your cluster by using a virtual networking environment, you can launch your cluster in an Amazon Virtual Private Cloud (VPC).
- Cluster encryption — To encrypt the data in all your user-created tables, you can enable cluster encryption when you launch the cluster.
- SSL connections — To encrypt the connection between your SQL client and your cluster, you can use secure sockets layer (SSL) encryption.
- Load data encryption — To encrypt your table load data files when you upload them to Amazon S3, you can use either server-side encryption or client-side encryption. When you load from server-side encrypted data, Amazon S3 handles decryption transparently. When you load from client-side encrypted data, the Amazon Redshift COPY command decrypts the data as it loads the table.
- Data in transit — To protect your data in transit within the AWS cloud, Amazon Redshift uses hardware accelerated SSL to communicate with Amazon S3 or Amazon DynamoDB for COPY, UNLOAD, backup, and restore operations.
Similarly, Snowflake provides industry-leading features that ensure the highest levels of security for your account and users, as well as all the data you store in Snowflake.
The following provides a high-level summary of the features, grouped by category:
- Network/site access — Site access controlled through IP whitelisting and blacklisting, managed through network policies. Private/direct communication between Snowflake and your other VPCs through AWS PrivateLink.
- Account/user authentication — MFA (multi-factor authentication) for increased security for account access by users. Support for user SSO (single sign-on) through federated authentication.
- Object security — Controlled access to all objects in the account (users, warehouses, databases, tables, etc. through a hybrid model of DAC (discretionary access control) and RBAC (role-based access control).
- Data security — All data automatically encrypted (using AES 256 strong encryption). All files stored in stages (for data loading/unloading) automatically encrypted (using either AES 128 standard or 256 strong encryption). Periodic rekeying of encrypted data. Support for encrypting data using customer-managed keys.
- Security Validations — Soc 2 Type II compliance. Support for HIPAA compliance. PCI DSS compliance.
My only point of caution when working with security is to make sure you know which Snowflake edition you’re working with. Not all of these security features are available with each edition. For example, if you want to leverage security validations features and work with HIPAA or PCI DSS, you need to be working with Snowflake’s Enterprise Edition for Sensitive Data (ESD).
The Data Warehouse Decision
Whenever you’re working with data, you’re aiming to get results as quickly as possible. Remember, data is the engine that helps drive the business. And a good data warehousing platform that’s simple to set up and operate will go a long way in making your business more competitive. Ideally, you always want to look for platforms that provide automated provisioning, automated backups, and are fault tolerant. From there, solutions like those from Panoply help you by automating and optimizing the data management life cycle with transparent pricing and 24/7 chat support.
When selecting the right platform, take your time to do the right amount of research. As mentioned earlier, if you’re really worried about compliance and regulations, you’ll likely get more operating options on Redshift. From there, know what you need to integrate into. That is, are you leveraging other cloud services or are you trying to partner with data visualization technologies? Conducting a trial or PoC is a great way to get started and test out the waters. Plus, it’ll help you understand integration points and how to manage the entire platform.
They key here is to actually get started. What we discussed today revolves around powerful data warehousing systems that are specifically designed to be fast, scalable, and help your business. Start by asking the right questions, conducting some researching, and working with partners that can help navigate your data journey.