More organizations are leveraging applications that require shared, synchronized information, thus driving the need for a single view of key data entities commonly used across the organization. At the technical view, the drivers and fundamentals of master data management (MDM) can be summarized as processes for consolidating variant versions of instances of core data objects, distributed across the enterprise into a unique representation. In turn, that unique representation is continually synchronized across the enterprise application architecture to allow master data to be available as a shared resource. The result is a master data asset of uniquely identified key data entities that can be integrated through a service layer with applications across the enterprise.
This paper will explore some of the key components of any MDM solution and considerations that should be factored into an organization’s overall MDM strategy.
The conceptual framework for the Master Data Management (MDM) service should include critical capabilities, such as master/reference data source identification, master data acquisition, metadata master hub management, integration, and access.
Organizations will require processes to identify and validate one or more sources of data associated to one or many subject areas. Business applications may contribute core data for the selected subject areas. External Data Providers also may be a source of reference data (from external agencies).
Data acquisition should include real-time, near real-time, and batch processes built on standard message formats, including ETL, EAI, EII, and SOA, to acquire and aggregate data from one or more sources. The data profiling and discovery capability provides supporting entity and attribute information to the data acquisition process.
Metadata provides an array of functionality supporting core MDM hub functions, namely:
- Data Models: The MDM Hub supports user-defined data models for each subject area, such as Customer, Product, Reference Data, etc. Models contain attributes that identify the business structures of the master data record. The source enterprise master data attributes will span source systems.
- Schemas: Schemas support the localization of the physical data for every subject area.
- Standards and Rules Repository: The Master Data Hub is the repository for data standardization, match, and merge rules that are configured and stored as part of the hub metadata.
- Metadata Monitoring: Hub metadata can be logged, monitored, and/or versioned for changes to the metadata and the underlying model.
The data hub provides core services for data management and for entity identification of “gold copy” reference data that will be deemed master data.
Key components include:
- Cleanse Engine: Process to execute user-define cleanse functions on the data acquired. Plug-in capabilities enable a call-out to third party routines for extensive cleansing and standardization.
- Match/Merge Engine: Process to execute configured business rules to match data from multiple sources based on pre-defined attributes and parameters.
- Data Stewardship: Data stewardship processes support overriding match and merge rules at the record and field level and overall master data management.
- Hierarchy Management: Integrated hub capability to establish and track data relationships within the hub (such as customer and product hierarchies). This capability is supported using visualization and an automatic refresh based on underlying data changes.
Data management supports the management of historical data (for example, records merged are saved to support an unmerge capability), audit tracking, and access/security (who can update records, attributes, models, hierarchies) within the master data hub.
Integration supports standard messaging formats across multiple protocols as well, as workflow management, cross-referencing, and data sharing. The integration layer is a key component for the different data integration processes, like EAI/EII/ETL, workflow management, and messaging.
Access and Security
Managing access and security requires a workbench of tools that enables the creation and delivery of reports, provides a GUI for Data Stewards to perform manual exception handling of master data record merging, and allows for the ability to monitor data quality in the hub.
The MDM reference architecture must be resilient and adaptive to ensure high performance and sustained value. Some of the key characteristics of the MDM reference architecture include:
- Processes to manage and maintain master data as an authoritative source to and securely deliver accurate, up-to-date master data across the business enterprise to authorized users and systems.
- Support for coordinating and managing the lifecycle of master data.
- Availability of accurate, critical business information as a service to be used in the context of a business process at the right time by any authorized user, application, and/or process.
- Ability to cleanse data and improve the quality and consistency for use in operational environments.
- Support for making master data active by detecting and generating operations to manage master data, implement data governance policies, and create business value.
Master data in a subject area is made up of a collection of attributes that describe it. Since there are a large number of attributes that describe sophisticated subjects, attributes are classified into the following categories:
Identifier attributes are used to uniquely define an instance. These important attributes are further classified into the following subsets of attributes:
- Global Identifier: The unique non-intelligent and often system-generated identifier for an instance.
- Identifying Attributes: Minimal set of attributes, most often human legible, used to define a unique instance.
- Alternate Identifiers: Attributes that store cross-reference identification information of instances stored throughout the enterprise in other applications, systems, and processes.
Core attributes are the most commonly reused attributes throughout the enterprise. For example, core attributes of customer master data could include attributes like name, address, contact info, et
Extended attributes are remaining attributes used in specialized business processes. Examples include description attributes. There are many extended attributes in number compared to the number of core, alternate identifiers, and identification attributes. Further, extended attributes are most often subdivided among categories grouped by business process.
Authoritative Sources and Data Fragmentation
Master data is fragmented (or distributed) in two dimensions. Attribute fragmentation is the distribution of attributes along the classification described in the previous section. Instance fragmentation is the distribution of master data records.
Though both attribute and instance fragmentation occur, fragmentation does not directly impact data quality and complexity. It is the fragmentation of data in conjunction with the number of disparate authoritative sources of data that add to the complexity of maintaining high-quality master data.
Master Data Management Services
Master data quality is managed through architecture and manual processes governed by a stewardship model. The MDM services fall into the following groupings:
- Managing Metadata: Services for setting up metadata and managing changes.
- Managing Master Data Quality: Master data services that cleanse, view, edit, author, merge, etc.
- Master Data Applications: Services that allow applications to use master data through publishing, auditing, reporting, etc.
Stewardship and Governance
Master data stewardship enforces the policies and accountabilities for maintaining master data. It is critical to recognize that the data stewardship process and the master data management services intersect.
The two can of course be handled independently of each other; however, for truly breakthrough business value, the two efforts must be carefully coordinated.
Hub Architecture Areas
The two main areas of a Hub Architecture are the metadata management layer and the master data management layer. All of these capabilities must be accounted for in an organization’s workplan.
The Hub Metadata Management layer supporting functions include:
- Hub Data Modeling: This function is used to design the target master data record and input records that will be used to source data for the target master data record.
- Rules Management: Rules dictate how attributes from source records are mapped into a target master record. Trust factors are assigned and can be used to resolve conflicting attributes.
- User Management: Users of the hub, including hub administrators, data modelers, rules designers, and data stewards, are managed by the user management function.
- Security and Access: This function provides the administrator interfaces required to control access to subject areas, source and target records, source and target attributes, rules, design tools, etc. It is common to limit the data steward’s ability to review, merge, unmerge, and make updates at the record and attribute level. Similarly, data modelers and rules designers will be limited to subject areas (e.g., customer and product) and may be further restricted at the record and attribute level.
- Performance and Scalability: The architecture provides horizontal and vertical scalability to support large volumes of data.
The Hub Master Data Management layer supporting functions include:
- Data Upload: This function supports data loading from multiple sources using batch, near real-time, and/or real-time interfaces into the hub.
- Data Standardization and Cleansing: The hub provides basic cleanse/standardization capabilities; however, interfaces are provided to enable third-party tools and optionally custom routines, such as real-time data validation lookups.
- Match and Merge: Match and Merge engines use rules to identify matching source records and enable either automatic or manual merging into a “golden” master data record.
- Hierarchy Management: The Hierarchy Manager provides the ability to manage relationship structures across master data records with the goal of viewing those records in a hierarchical presentation (e.g., customers by territory, vertical, size).
- Stewardship and Reporting: User-friendly interfaces provide access to rules and data management functions, supporting both the stewards and administrators. Specific capabilities include:
- Information steward functions:
- Identify and manage candidate master data sources and trusted sources
- Manage data standardization and cleanse rules
- Match and merge data
- Manage and monitor data quality
- Administrator functions:
- Manage hub schemas and metadata and data models
- Manage user access
- Manage database resources
- Monitor and manage performance and scalability
- Manage operational tools and services
- Information steward functions:
Architectural Approaches to Master Data Management
Architecture patterns capture reusable design templates to common problems. The designs are based on collective experience of “proven techniques” used by internal and external sources. This section presents the logical groupings of frequently used MDM architecture patterns.
Since MDM is concerned with the creation of an enterprise-wide “system of record” for core business entities, it would seem natural to limit architecture patterns to only those that have a single system of record. Unfortunately, such a design is unrealistic in some companies due to scalability and reliability considerations, physical distribution of business processes, regulatory restrictions, and distribution of centers of expertise. Typical patterns to consider and their tradeoffs, which vary based on how the master data is distributed and shared across the enterprise, include those listed below.
In this approach, applications communicate with each other using a point-to-point interface. This approach may work very well for a small number of applications, but as the number grows, the interfaces will become complicated and redundant, affecting quality and reliability.
Enterprise Service Bus
This approach refers to application integration to multiple downstream applications via a common data bus. This approach reduces the need for redundant interfaces that repeatedly send the same data updates to multiple applications. It also provides master data access using publish/subscribe or request/reply techniques. However, this approach relies on source systems to manage data and does not provide capabilities to identify and resolve conflicts across source systems.
Master Hub as a Channel
In this approach, master data from multiple sources is aggregated into one application/system/database and distributed to downstream applications using a data bus. This approach adds value by centralizing master data and can be used to identify and resolve data redundancy. However, this approach does not centralize master data management processes, which remain at the local source systems.
Master Data Hub – Persistent Hybrid
This improves on Master Hub as a Channel by adding centralized data management services to the hub. This is a self-contained master data hub (for key and core attributes) with integrated data services, such as data quality management, information stewardship, data enhancement, integration, quality monitoring, and harmonization.
Organizations need to consider a number of factors, including processes and architecture attributes, when developing an MDM strategy. On the process side, the conceptual framework should include master data source identification, acquisition, hub management, integration, and access. In tandem with these processes, the MDM architecture should be capable of long-term high performance and responsiveness to continuous change.
Although a comprehensive MDM strategy is essential for organizations to maximize the value of their data, knowing where to start and how to execute this strategy can be daunting. With our extensive experience and expertise deploying MDM strategies and implementations, Knowledgent can help organizations navigate the complexities of developing the right MDM processes and architectures for their environments.