Applying a modernized approach to the concept of data management is a necessity in today’s cloud computing environment. This facilitates insights by embracing full transparency across your data lifecycle, allowing seamless extraction of the most useful data, all at the speed of business. This is part one of a two-part series on the intersection of cloud computing and data management; check out part two here, which focuses on data management strategy and best practices.
Big Data, Big Storage, Faster Access
The price of storage has plummeted. Purchasing a gigabyte of storage would set you back an average of $437,000 in 1980; today, that average equates to literally getting two cents’ worth of tech. And this is for one’s personal or business hard drive; obviously, the cost of bulk storage in a data warehouse would be far less expensive, and even this — the mindset, methodology, and value of what traditional versus cloud will be soon with respect to data storage, access, and management — is changing too, as today’s “traditional” may also be seen as becoming more antiquated, cumbersome, and an all-around timesuck when compared to the capabilities of data stored and accessed via cloud-based data warehousing.
What does this all mean for you and your company? Well, by the time you read this sentence, the amount of global data has not only changed since when the sentence was written, so has the rate at which that volume has increased. And — equally important to wrap one’s brain around — the velocity at which it is distributed has also accelerated. (Those stats will change again by the time you finish this article.) So the data management question becomes: What is the most efficient way to harness the exponential increase of your organization’s data, and all of its kinetic energy, in a way that optimally serves specific business needs?
And for the project manager grappling with the problem of lots of data in different places, the sought-after solution can be distilled down to a simple request: How quickly can I get from raw data to analysis?
In this article, we’ll cover some concepts and reference the tools necessary to implement an efficient data management protocol, emphasizing the capabilities inherent in smart cloud data warehousing.
What is Data Management?
Data management is an administrative process that includes acquiring, validating, storing, protecting, and processing required data to ensure the accessibility, reliability, and timeliness of the data for its users. It is a broad term that can refer to a role (a data manager), while also referring to an organizational responsibility.
Within the parameters of data management exists responsibility for the entire data lifecycle, from collection to consumption. This includes its point of origin (data provenance) and transformative journey from origination to current point of reference or observation (data lineage). These attributes are particularly useful in managing data: By describing the journey of a piece of data, visibility becomes available throughout the data pipeline, checks can be monitored, and incidents of compromise or failure can be traced directly to sources.
Data Management Roadblocks in the Organization
Data management is often not prioritized as it should, and IT and data engineering may not be a skill set that companies have on premise. Those companies nonetheless understand the value of having an analytics environment and expect insights delivered via BI tools for product or sales strategy, cost overruns, marketing campaigns, etc. This may lead to utilizing whatever technical resources they have on hand to write a build script to ensure that data collection and dissemination is operational. Those tasked with this assignment may not realize how hard it can be to have data jobs run smoothly and efficiently and — equally significant — possess the agility to respond and recover when failure strikes. This approach may indeed be functional; however, without a dedicated data science team or relying on an outside source (for coding and everything else), it is rarely optimal.
One piece of good, if not great, news is that there are some truly exceptional data management tools out there to assist in managing the lifecycle of data.
Data Management IRL
A simple, apt analogy might be the decision to have a pizza night at home. You must acquire and collect source material (dough, sauce, toppings). You then integrate, transform, and load (assemble and bake) your aggregate components, keeping watch over organizational and reference data indicators and fields (oven temp, timing, dietary requirements of customer-guests), with the final (and delicious!) step being the visualization layer. And, just like in business, here the data is also sliced and analyzed, and real-time analytics drive the direction of future (and increasingly desirable) outcomes.
In our example, data management started with the decision to make pizza, but also included structured and attentive oversight (from collection to consumption) of task execution. These necessary attributes are part of best practices to implement for successful data management. And depending on the size of an organization and the scope of business, some aspects will be more valuable than others, but rarely will any aspect not have some notable value.
Significant aspects of data management include the following:
Data Collection. Planning and instrumenting software and hardware products to collect data that diagnoses failure and measures success. This includes everything from server log files to mobile app interaction tracking.
Reference Data Management. Defines permissible values that can be used by other data fields, such as postal codes, lists of countries, regions and cities, or product serial numbers. Reference data can be home-grown or externally provided.
ETL (Extract, Transform, Load) and Data Integration. Loading data from data sources into a data warehouse, transforming, summarizing and aggregating them into a format suitable for high performance analysis.
Master Data Management. This describes the method for managing critical organizational data: customers, accounts and parties named in business transactions, formatted in a standardized way that prevents redundancy across the organization.
Cloud Data Management. This is the process of integrating data from an organization's ecosystem of cloud applications. The main distinction of cloud data management is that all data storage, intake, and processing takes place in a cloud-based storage medium.
Data Analytics and Visualization. Processing selected data from big data sources and data warehouses to perform advanced data analytics, this allows analysts and data scientists to slice, dice, and present visualizations and dashboards.
A Word About Data Governance
The word “governance” refers more to strategy than technique; however, effective data management should fit hand in glove with proper data governance, and it’s a straightforward and important concept to grasp, even if the phrases are often (incorrectly) interchanged. Data management can be thought of as the logistics of an operation: the data team(s) must be in place to ensure that data is accurately sourced and collected, processed, verified and stored, and made available for end user analysis. The execution of policies and priorities — these all fall under the rubric of data management.
Creating the right data model so that analysis can, in fact, take place — the ownership and accountability and where “the (data) buck stops here,” so to speak — is fulfilling the mandate of data governance.
It’s making sure the correct stakeholders are present to create and oversee the entire data program. What are the policies on compliance and security? What are the data priorities regarding risk management or determining profitability targets? In short, who decides what (at the top)?
The level of tech involved doesn’t define data governance; when deployed correctly, however, technology firmly supports it. This fosters an environment more conducive to streamlined and transparent data management.
Extracting Value, Efficiently
Everything is data, and data generation is ever-increasing. How we manage data has a lot to do with how we approach adaptation as a whole, for this is the age where the IoT is rapidly becoming the IoE (Internet of Everything).
In other words, a large part of optimizing data management in today’s world is delegating correctly: choosing the right people for the right tasks, while simultaneously embracing and then deploying the right tools with an eye toward the future. (The latter really can’t be overstated enough when adopting or integrating a cloud-based framework into your organization.)
For the business owner or project manager, the ideal is interfacing with a data warehouse that lets you quickly and easily consolidate the data from your databases, cloud services, and applications into a single data management platform.
Please check out part two of this series, Data Management Strategy and Best Practices.