Panoply Blog: Data Management, Warehousing & Data Analysis

Data Management Best Practices for Cloud Computing

Written by Cheryl Adams | Sep 19, 2017 10:09:44 PM

As more organizations begin their journey to the cloud, they need to plan how they will apply the best practices of data management to ensure that cloud-based, data-driven use cases are successful for end users and comply with enterprise governance and data standards. The good news is that existing best practices work well in cloud environments, although adjustments are usually needed. Here are several examples of data management best practices for cloud computing.

1. Manage data across all platforms, including the cloud.

This statement is true whether data exists on premises, in the cloud, or both (as is common in today's multi-platform hybrid data architectures). It is also true whether data migrates to a cloud, originates there, migrates off a cloud, or in some combination of these. Enterprise-scale data and application architectures that involve clouds can be complex. TDWI (Transforming Data with Intelligence Organization) regularly sees organizations succeed with clouds by extending or augmenting existing teams, skills, governance policies, business sponsorship, data management practices, and data integration infrastructure.

2. Deploy a data management infrastructure into the cloud.

In complex scenarios such as those just described, you will need specific tools and architecture for data integration -- and sometimes application integration, too. This infrastructure is required to migrate and move data among platforms. This infrastructure should be in place before starting your journey to the cloud because retrofitting it later is risky and disruptive.

If you have a pre-existing infrastructure for data integration, you may be able to extend it to cloud platforms. You should also be open to additional tools that are built and optimized specifically for the kind of cloud and use case you need. As with your on-premises best practices, cloud best practices and tools need to address data quality, metadata, master data, and varying data speeds.

3. Give priority to data integration requirements for the cloud.

As you design and revise data integration solutions, give careful thought to where specific processing should occur -- in the cloud versus on premise. Likewise, you will most likely need to adjust your approach to data landing and staging. Be sure your data integration tool-set supports the interfaces and protocols of current cloud-based applications and platforms, not just on-premises enterprise sources. TDWI sees users increasing adopting cloud-based Hadoop, which involves multiple interface points (such as MapReduce, Pig, Hive, HBase, Spark, Drill, and Presto). Similarly, look for support for APIs that are proprietary to the cloud provider you have selected.

Data coming from or going to clouds is trending toward real time, so your data integration tools and data management infrastructure should address multiple interfaces, ranging from the offline batch and micro-batch to real-time and on-demand.

For years, TDWI has seen organizations depend on their data integration tools and platforms for metadata management, and this trend continues with clouds. Be sure your strategy supports multiple metadata types (technical, business, and operational) that can be accessed by many applications and user types. Finally, many clouds are capturing big data and other new data types (IoT and sensor data). Because these tend to be "metadata-poor," look for tools that help you deduce, develop, and inject metadata.

4. Govern data holistically.

Regardless of the data's platform or location organizations with a pre-existing data governance program (or similar program for stewardship or curation) can most likely revise existing policies designed for on-premises data usage, and thereby assure compliance for data that is traveling in and out of clouds. Organizations without such a program should leverage their journey to the cloud as a driver for initiating governance.

TDWI's view is that data governance is a critical success factor for most data initiatives because it avoids the non-compliant use of data, and it aligns data management work with business goals. When governance extends beyond compliance issues to data standards, it also elevates data's quality, usability, and trust. Data governance should apply to all data, whether on premises, in the cloud, or strewn across hybrid architectures.