Data. It’s the word that that results in billion dollar valuations, that drives industries, that creates new revenue streams, and that has introduced a “New Economy” where companies are able to harness massive volumes of it to drive their businesses forward.
Why this sudden explosion of data? And how do companies gather, store and analyze this data to generate meaningful insights?
We’ll take a look into what drives modern analytics, and how companies can leverage data to propel themselves forward.
First, it’s important to understand where all this data comes from, and what makes it such a hot topic.
Data collection and analysis has been carried out in some form or another for centuries, but giant leaps in the storage of data really took place throughout the 20th century. By the turn of the century, modern business analytics as we know it today was becoming popular, accelerated by factors like new tools, movement to the cloud, mobile usage, and increased processing power.
Business analytics has come a long way since “gut feel” drove major business decisions, and even when data was available to enhance decision making, it was often a case of “too little, too late”.
Today, businesses in a variety of industries are using modern Business Intelligence (BI) and analytics tools to leverage their data. In sports, for example, Moneyball showed how Billy Beane of the Oakland A’s used data and analytics to build a winning team. In industrial applications, data is used to help power plants operate more efficiently, identify problem areas, perform preventative maintenance, and manage assets. AirBnb uses data to predict the value of homes on the platform; in fact across every industry and vertical, data and analytics are are front and center when it comes to solving business challenges.
The amount of data being generated is unprecedented. In 2009, we were impressed when it was calculated that the average US company with over 1,000 employees was storing more than 200 terabytes of data. Today, it’s expected that autonomous vehicles will be generating upwards of 5000 TB annually – per car.
How are companies expected to manage so much data? There are a number of challenges, besides the sheer volume. Add to this the complexity of gathering the data from multiple sources – such as ERP, CRM and Social channels. Then, organizations have to standardize this data so that they can get an overall picture of what all this data means. Each department will have different needs and will query the data differently. In short, a system is needed to gather data, standardize it, store it, and make it available for end-users to query.
This is where Extract, Transform, and Load (ETL) and Data Warehousing come in. As an ETL introduction, the ETL process steps are gathering the data from the disparate sources (Extracting), sanitizing and standardizing this data (Transform) and then transferring this data to a data warehouse (Load). An ETL example would be moving data from an OLTP database into a dimensional data warehouse.
Data warehouses are used by organizations to store data in a way that can be queried by users, and are used as centralized repositories for analytical and reporting purposes. As opposed to a Data Lake, a Data Warehouse typically stores structured, processed data. Various ETL tools are used to ensure that information housed in the Data Warehouse can be relied upon – you can see an ETL tools list here, and an ETL tutorial here.
It is from these data warehouses that BI tools can display data that is useful to users through reports, dashboards, and visualizations.
Walmart, for example, leverages BI tools to visualize their 40 petabytes of recent transactional data, and has reduced the time it takes to get answers to complex business questions from weeks to just minutes. BI tools can show the company where pricing is incorrect, help send the right stock to the right stores, provide alerts, and help respond in near real-time to crises.
Companies attempting to analyze their businesses, make decisions, and leverage data – without effective ETL and data warehousing in place – are going to struggle. But building a traditional ETL process may no longer be enough. Today, there are also more modern solutions than the traditional ETL process steps.
One of these is streaming. In the past, the ETL pipeline would be made up of several distinct phases: first, a set of data that defines the set of permissible values your data may contain, would have to be created (reference data). Then, data would be extracted from the various sources. Next would be data validation, after which data is transformed. Data would typically not be loaded directly into the target data warehouse, but would first enter a staging database. Finally, it would be published to a data warehouse as a new batch.
With streaming, however, data is transferred in real-time and data is treated, transformed and saved immediately to the target data store.
Another more modern method is ELT (Extract, Load, Transform). With ELT, data is immediately moved after extraction into a centralized data repository. Data is transformed and loaded “on the fly”, and this method has become increasingly popular, especially with advances in cloud storage.
Still another modern method is building a pipeline without ETL software using an automated cloud-based Data Warehouse. New cloud-based data warehouse technology makes it possible to achieve the original goal of ETL without building an ETL system at all. For example, Panoply’s cloud-based automated data warehouse has end-to-end data management built in, utilizing self-optimizing architecture, machine learning and natural language processing to automatically extract and transform data to match analytics requirements.
Modern analytics takes a data-oriented approach to business decision making, and uses BI tools to help make sense of the data. ETL software and data warehouses form the bedrock of an effective, data-driven decision making policy.
Organizations need to ensure that all data available, from all sources, is utilized; that this data is complete and accurate, and that it answers the requirements of all users of the data, be they from finance, marketing, or any other area of the business. According to McKinsey, companies correctly leveraging data can expect to see a 60% increase in margins.
Start your free trial to see how you can benefit from Panoply’s platform and proactively drive your business’s success.
Well now we have even more good news for Data Analysts! Panoply’s proprietary smart data warehouse has received a major upgrade in the way that it automates the performance of query materializations, which will let you get your query results up to 100 times faster. Overall, you’ll see great improvement in your queries, especially those stickler queries that take 30 seconds or more to come back with results - even those will be twice as fast.