Contemporary society generates, uses, and retains amounts of data that would be considered huge—if not unimaginable—by any earlier standard. Yet IDC expects the size of the global datasphere to continue to grow in the coming few years and eclipse what exists today. IDC estimates that in 2025, the world will create and replicate 163ZB of data, representing a tenfold increase from the amount of data created in 2016. This hypergrowth is the outcome of an evolution of computing that goes back decades.
Increasingly, data will need to be instantly available whenever and wherever anyone needs it. Industries around the world are undergoing “digital transformation” motivated by these requirements. In the same report from IDC, by 2025, more than a quarter of data created in the global datasphere will be real time in nature, and real-time IoT data will make up more than 95% of this.
So how are organizations going to manage all of this? Data warehousing is a great tool, but how widely is it being adopted? Most of all, what are the mechanisms to leverage all of this data and turn it into actionable insights?
Deploying data warehousing
Very recently, Panoply released their Data Warehouse Trends Report. There we found that 21% of the people we interviewed stated they currently do not have a data warehouse solution. This seems relatively high, considering the fact that the survey focused on people at re:Invent 2017, an event specifically designed for infrastructure and data professionals. One of the obstacles to embracing a data warehouse may be the belief that they still have to denormalize and aggregate their data. As a major trend for the future, cloud-based data warehouses are specifically designed to overcome these issue.
The future is less cComplex
In Panoply’s report, the majority of survey respondents, over 62% of them, indicated the management of their data warehouse solution is difficult or very difficult. When asking respondents to tell us why they were unsatisfied, they mentioned complexity, cost and performance. Specifically for Redshift some respondents were unsatisfied because of Complexity and Performance, with 38% of respondents saying it’s too complicated.
Again, this is where intelligent data warehousing solutions come in to play. Cloud-based data warehouse solutions have made the data mart strategy less relevant. Solutions like Amazon Redshift, Google BigQuery and Panoply manage partitioning and scalability of the data warehouse in a transparent manner. So it’s possible to setup a petabyte-scale Enterprise Data Warehouse, holding all organizational data, without the cost and complexity of traditional data warehouse project.
The Future is hands-on
To understand the complexity issue better, Panoply’s survey asked if infrastructure teams would prefer if data scientists and analysts should manage their own data flow, with automated tools. 81% of respondents said yes, they would prefer data professionals to have these tools. When asked about what processes they would want to automate in their data warehouses, respondents mentioned four areas:
-
Ingesting different data sources
-
Transforming data
-
Managing data
-
Query optimization
‘Transforming Data’ was mentioned most often by Redshift and Azure users, while BigQuery users showed a bigger appetite for ‘Ingesting Different Data Sources’.
Finally, some key trends accelerating usage of business intelligence (BI) Tools include the desire for self-service, de-silofication, better data visualization and having a cloud-based solution. Other key trends that continue to drive adoption are AI and Machine Learning, which help to automate data-driven processes and accessibility.
Working with a good data warehousing system is a means to develop and deploy around these BI tools. Solutions like those from Panoply integrates BI tools such as Metabase, Tableau, Data Bricks, Looker, Power BI, Re:dash, Zeppelin, iPython Notebook, Shiny Apps by RStudio, and Sisense. In fact, any BI tool that supports ODBC, JDBC, Postgres, or AWS Redshift can connect to Panoply. It’s these kinds of solutions that give you the most flexibility in working with a variety of data sources and types. Remember, when working with data, the future (hopefully) is a lot less complex, more automated, and helps bring even more value from data.