Data analytics allows organizations to gain insight into their business operations and find new ways to increase their revenue. However, the data sources used to provide these insights are often located in different places and accessing them can be problematic. Data virtualization and data warehousing are the two contenders most often referenced as the best way to store and access data sources. Let's take a look at what each solution has to offer and determine who wins the battle between data virtualization and data warehousing.The Contenders
Data virtualization (DV) is a technology for business intelligence that doesn’t involve moving or storing data. Instead, an organization uses a combination of metadata and APIs to interface with data from multiple sources. Users access these original data sources using federated queries and they interact with consistent views of data provided through a unified abstraction layer.
Introduced in the 1980s but still popular, is the enterprise data warehouse (EDW). Relational data from one or more sources is captured and integrated, and a central repository of data is established as the standard model for the enterprise—a “single version of the truth.” Maintained in an environment separate from an organization’s primary operations, a data warehouse enhances both system and query performance while enforcing consistency.
With its quick setup, efficiency and easy scalability, proponents of data virtualization proclaim it as a leading-edge solution, the future of big data. EDWs are bloated beasts from yesteryear, says team DV. Central repositories gobble up time and money with their huge setup, maintenance and hardware requirements while being crushed under the swelling volumes and the wide disparity of data.
Enterprise data warehouse proponents throw very telling blows at the virtualization construct. Data warehouses were designed and implemented to securely store historical data, protect users from source system upgrades and protect the source system from performance degradation due to impact from complex analytics and reporting demands.
As is typical with arguments about complex issues, people indulge in sweeping black and white statements while ignoring vital details and recent developments. For starters, data warehouses have changed since the early days: the clunky behemoth isn’t an accurate metaphor for today’s EDW. Many businesses now use data warehouse automation tools, or platforms to speed development as well as automate testing, maintenance, and all other steps in a data warehouse’s lifecycle. Automation also brings additional benefits, such as easy enforcement of standards and best practices, the ability to maintain historical records, and others.
Another misconception about today’s data warehouses is that populating them is also a lengthy, expensive undertaking. While the extract, transform, and load (ETL) processes that move data into storage once involved lengthy hand-coding, these processes are primarily automated today. And ETL has eliminated in Panoply’s Smart Data Warehouse, which uses machine learning and natural language processing (NLP) for data migration and analysis.
While on-premise data storage does demand provisioning, maintenance, and upgrades, today’s cloud-based data warehouse solutions offload these IT tasks to outside service providers. Company resources can focus on primary business objectives instead of tending to a brood of servers.
While data virtualization has its place, new cloud versions of the enterprise data warehouse have emerged. Products like Panoply provide an intelligent data management platform for analytics in the cloud that automatically adjusts to give the results you want. Using data technologies, including Amazon's Redshift, Panoply utilizes machine learning and natural language processing (NLP) to learn, model, and automate the standard data management activities previously completed by data analysts, engineers, and scientists.
Data professionals today can choose from an ever-expanding range of solutions to generate rational, data-driven decisions and customer engagement from diverse streams of data. Instead of championing one solution and deriding the others, support the solution that works best for your organization.