What is data management?
Data management is a broad and ambiguous concept. The Global Data Management Community (DAMA International) defines it as “the development of architectures, policies, practices and procedures to manage the data lifecycle”. But when people say “data management”, what do they really mean? We suggest four possibilities:
- Master data management - a method for managing critical organizational data: customers, accounts and parties named in business transactions, in a standardized way that prevents redundancy across the organization.
- Reference data management - defines permissible values that can be used by other data fields, such as postal codes, lists of countries, regions and cities, or product serial numbers. Reference data can be home-grown or externally provided.
- ETL and data integration - loading data from data sources into a data warehouse, transforming, summarizing and aggregating them into a format suitable for high performance analysis.
- Big data analytics and visualization - processing selected data from big data sources and data warehouses, performing advanced data analytics, and allowing analysts and data scientists to slice, dice, and present visualizations and dashboards.
With today’s massive quantities of data, tools are essential to achieving data management best practices. Organizations use data management tools from all four categories above, to manage and automate the data management process:
- Master Data Management (MDM) tools - help visualize complex sets of master data across the organization, and facilitate data stewardship by subject matter experts, who oversee creation and maintenance of reference data.
- Reference Data Management (RDM) tools - often provided as part of MDM suites, define business processes around reference data, and help stakeholders populate reference data and manage it over time.
- ETL tools - help organizations load data from multiple sources, define complex, automated transformations of the data, test the data pipeline, and load data continuously to a target database or data warehouse.
- Big data visualization and big data analytics tools - help organizations explore, analyze and visualize big data sets, and generate reports and dashboards to extract insights and guide business decisions.
Below we cover 3 great tools from each of these categories, to help you understand each of these categories, and move closer to selecting the data management tool for your needs.
Best Master Data Management tools
Dell Boomi's Master Data Hub has the following key features:
- Defines models via low-code, visual experience.
- Deploys data models and identifies which source systems interact with them.
- Onboards system records into a consolidated repository, automatically merges similar records.
- Enables data stewarding - alerts teams to resolve duplicates and data entry issues.
- Governs data with real-time bidirectional process flows across silos.
Profisee’s Master Data Management has the following key features:
- Stewardship and governance - enables “data stewards” within the organization to manage master data with feedback from analytics.
- Golden record management - standardizes, cleans and matches source data with no coding.
- Event management - detects data changes, distributes events to subscribing systems.
- Integrator - federates master data for global enterprises, with real time bi-directional integration.
- Enterprise workflow - enforces business processes cross-organization, lets administrators manage data steward performance.
- SDK - enables integration of custom applications.
SAP NetWeaver MDM, a component of the NetWeaver development platform, has the following key features:
- Automatically extracts master data from all major SAP applications.
- Loads master data from other sources.
- Integrates data using business content like repository structures, validation rules, inbound and outbound mappings.
- Distributes master data to targets.
- Enables programmatic data integration via APIs and web services.
Best Reference Data Management tools
Collibra’s Reference Data solution has the following key features:
- Automates workflows to create new codes and code sets.
- Delivers codes and code sets to users in a friendly way.
- Performs accurate data mapping to eliminate barriers to data access.
- Compares data from different parts of the organization.
Magnitude’s Reference Data Management has the following key features:
- Multi-domain modeling - supports business structures from code lists to multi-path, self-referencing hierarchies.
- Automation - provides automation, governance and control over reference data objects and load processes.
- Mapping - provides global to local, external to internal, and specific to general mapping with no disruption to existing elements.
- Governance - provides a customizable workflow to control business processes related to reference data, with model-based security controls allowing users to view, add or update.
- Time variance - enables users to change models, subjects, attributes and associations and retrieve any previous version of the object.
Informatica MDM Reference 360
Informatica’s MDM Reference 360 has the following key features:
- Fully cloud-based - improved performance and scalability.
- End-to-end platform - embedded data integration, data quality, process management.
- Self service - Master Data Management and workflows built for business users with no technical background.
- Match and merge - merges and cross-references data from new types and sources.
Best ETL and data integration tools
Informatica Powercenter is an on-premise ETL tool with the following key features:
- Seamless connectivity and integration with all types of data sources using out-of-the-box connectors.
- Automated data validation - script-free automated audit and validation of data moved or transformed.
- Advanced data transformations - supports non-relational data, able to parse XML, JSON, PDF, Microsoft Office and IoT data.
- Metadata-driven management - provides graphical views of data flows, impact and lineage.
Stitch Data is a cloud-based ETL platform with following key features:
- Pre-integrated with dozens of data sources on and off the cloud, moves data into Amazon Redshift, S3, BigQuery, Panoply, PostgreSQL, and more.
- Easy scheduling for data replication.
- Error handling and alerting with automated resolution when possible.
- API and JSON framework, letting you push data into a data warehouse programmatically.
- Managed cloud service with automatic scaling and enterprise-grade SLAs.
Blendo is another cloud-based ETL and data integration service, with the following key features:
- Self service - connects to numerous data sources with a few clicks, moves data to Amazon Redshift, Panoply, PostgreSQL, MS SQL Server, and more.
- Historical data - loads and synchronizes historical data from cloud services.
- Scheduled loading - load data periodically or at selected frequencies from different data sources.
- Data scheme optimization - automated collection, detection and preparation of data using optimal relational schema.
Best big data analytics and visualization tools
Tableau is a BI platform available both on the cloud and as downloadable software, with the following key features:
- Easily connects to data sources.
- Allows easy access to visualizations for teams, partners and clients.
- Enables unlimited data exploration with interactive dashboards.
- Creates “dashboard starters”, actionable dashboards setup in minutes with data from popular web applications.
- Creates interactive maps automatically.
Chartio is a cloud-based BI and visualization platform with the following key features:
- Interactive mode - drag and drop data to create, filter and share dashboards.
- SQL mode - communicate with databases in SQL to directly extract insights.
- Data layering - add successive transformation steps to data to transform query results.
- Visualizations and charts - instantly visualize data; Chartio recommends the most appropriate chart.
- Data blending and drill downs - combine disparate data sources on the fly and get actionable insights without exploring raw data.
Looker is another cloud-based analytics and visualization platform, with the following key features:
- Define metrics once using LookML, Looker’s simple data modeling language, and Looker writes SQL queries to answer any question on those metrics.
- Make data beautiful with easy-to-read dashboards that allow users to drill in and explore.
- Connect directly to databases, with no extracts or software to download.
- Open access to dashboards and reports to everyone, not just analysts or data scientists.
Towards automated data management
We covered four ways of thinking about data management tools - Reference Data Management, Master Data Management (MDM), ETL and big data analytics - and a few great tools in each category.
As data infrastructure moves to the cloud, more of the data stack becomes managed and fully integrated. There is no replacement for managing business processes around structured data in large organizations. But cloud-based platforms can help with much of the data management strategy - from treatment and preparation of raw data, to data ingestion, loading, transformation, optimization and visualization - automatically in a single system.
For example, Panoply’s cloud-based automated data warehouse can connect directly to data sources, manage data loading, clean and prepare data using natural language processing and machine learning, and apply transformations to make it ready for analysis. Tools which provide an integrated big data stack take us one step closer to a truly holistic concept of data management.