Data Industry and Trends

Data-Oriented Takeaways from AWS re:Invent 2016: Query S3, Batch, Glue and More

Written by Yaniv Leven|December 05, 2016

Attendance at this year’s AWS re:Invent conference almost doubled, from 18,000 people last year to 32,000 people last week. The large, international cloud computing event attracted professionals from all types of industries, including financial services, healthcare, gaming, and others interested in learning more about AWS.

At the event, AWS announced a battery of new services designed to facilitate the tasks of analyzing data, consolidating data sources, and migrating databases. Here’s our take on the new data-related services and their implications for users and the market.

Download Data Warehouse Trends Report 2017 

Query S3

One of the most interesting services was Athena, which allows users to run ad-hoc queries against their S3 and get results in seconds. It is a serverless, fully managed service. It is already available and it will charge you by the query – $5 for each query, per TB of data to be scanned.

AWS analytics services like Amazon Redshift and Amazon EMR have made petabyte-scale analytics accessible to companies of all sizes. With Amazon Redshift, customers can perform complex queries on massive collections of structured data and get superfast performance. The new Athena service does not tend to replace Redshift or EMR but adds another option for querying your storage.

Athena is basically Amazon’s answer to Google’s BigQuery. At Panoply, we use S3 for rotating and archiving our customers’ data to optimize their data lake cost. The heating and cooling of the data, based on the user’s business logic can take time. We plan incorporating Athena in our stack which should enable us to query directly from the archive level, reducing costs and enhancing performance for our customers. However, S3 is not a data warehouse, and the data it contains cannot be indexed which means querying Athena will require significant processing time for parsing and structuring the data. It remains to be seen how well it will perform with the many concurrent small queries that are common pattern when running analytics and building dashboards.

Batch and Glue

Another service that Amazon announced is AWS Glue, a fully managed ETL tool. Glue allow users to automate jobs for getting data from multiple sources (such as RDS and Redshift, and even external JDBC supported sources) and making it ready for analysis and tools. This shows the need to facilitate data management and operation, and to support business users and analysts in their jobs. As far as we understand, Glue doesn’t support the processing and management of streaming data. Nonetheless, it is a significant enhancement considering the masses of enterprise users who move their data stores to Redshift for analytics. Glue is not yet available to the public, so we haven’t had a chance to try it out.

On these same lines of facilitating data processing, Amazon also announced AWS Batch. Batch manages the provisioning and orchestration of resources to support jobs submitted. Interesting enough is also the support of containers. Users provide their containers images and Batch will run these on top of the instances provisioned.

Aurora Support PostgreSQL

In his keynote, CEO Andy Jassy noted that users want to have freedom and “Unshackle from hostile database vendors”. In his slides, he showed part of the red Oracle logo as the letter o in “hostile”. This led into his announcement that Aurora supports PostgreSQL.

Since PostgreSQL is compatible with Oracle RDS databases, this release was not only required by AWS customers, but also facilitates migration of Oracle databases to Amazon Aurora. This comes with the inherent features of Aurora as a scalable, fully managed SQL database on top of the cloud. According to Jassy, the service cost is only one-tenth of that of traditional commercial offerings, such as Oracle, and it is Amazon’s fast growing service ever. Taking all this into consideration, Oracle should be worried.

AI Building Blocks

Amazon announced their first three AI services to support image recognition, including categorization and even facial detection in real time.

Amazon Rekognition will allow companies to access advanced, deep-learning capabilities that traditionally required a lot of expertise and years to develop, and enable object and scene recognition, facial expressions analysis, etc.

The new Amazon Lex service is really cool. It supports natural language understanding and automatic speech recognition. This new service, which powers Alexa, will allow companies to easily build chat-bots. But beyond the fun-side of it, it can be used to construct really sophisticated software, that integrate well with the rest of the AWS eco-system.

A third service announced in this category was Amazon Polly. Polly is a simple text-to-speech (TTS) service. Simply put it, Polly takes text and returns an MP3. On its own, it’s not all that special but when combined with Amazon Lex, it can be used to create fully communicative applications that you can interact with simply by talking to them. Imagine automating your software builds simply by telling Echo to re-run them on Travis, for example.

These services take Amazon to the next level as an infrastructure cloud company, and not only when it comes to the underlying servers. These capabilities are not new technologies but show the significant ability of the AWS to run and analyze the huge amount of data required to operate machines and perform deep learning.

Much More

Download Data Warehouse Trends Report 2017 

According to Jassy, Amazon released 1000 new features this year. This is huge considering the organization size. Amazon announced new compute families and types that will support enterprise heavy workloads and analytical needs. They even added an option to attach an Elastic GPU when launching the EC2 instance type and size. This capability can help accelerate analytical and BI reports and dashboards, making them even more comprehensive and easy to understand.

Amazon also released great amount of DevOps tools, such as AWS CodeBuild and monitoring tools like AWS X Ray and Shield. However, we will leave it to you to learn more from both the Jassy keynote and Amazon CTO, Werner Vogel’s presentation.

Amazon keeps on growing and so does the amount of data it holds. We are a proud AWS partner and it was impressive to see how our cloud partner continues to innovate around the data it holds. We can’t wait and already making plans - Hopefully, we’ll see you in Vegas at next year’s Re:Invent conference.

From raw data to analysis in under 10 minutes.

Sign up now for a demo or a free trail of the Panoply.io platform.

Learn more about platform features