Master Data Management
MDM is an information-centric business process to consolidate and manage specific enterprise data that just happens to use technology to assemble, merge, and distribute the data in question. MDM arose from a need to ensure consistency of strategic shared information to improve data quality, accessibility, and security.
MDM is unique in that it is limited to specific shared information that is not transactional in nature such as: common reference codes, persons, products, or locations. Value realized from an MDM solution occurs when information is made consistent across the organization, duplicate records are identified and resolved, and the quality of the information is markedly improved.
Achieving data consistency and quality generally requires a thorough understanding of the information at hand, how and where it is created or modified, and what roles and rules are needed to manage its data life cycle.
Methods for MDM vary by business and business need. Long before any MDM solution can be implemented, extensive process and information re-engineering must be planned for. Organizations that do not integrate information across departments effectively have a much harder time getting consensus during this planning process. Despite the MDM data subject, methods, or tools used, a common practice of the planning process is to identify those responsible for the data in question and those responsible for its daily management (similar to the data owners and stewards above). For the select information within its domain, MDM should consider management from data inception through retirement and all uses between. Roles of an MDM project include Business Analysts, Data Architects, Data Owners, Data Stewards, data providers, and data consumers.
Improving data quality is one of those timeless things which can provide value on its own, or it can be done as a first step towards something else such as master data management (MDM), or it can be done together with MDM.
Data quality independently, which is good for cleaning data inside an application or as it’s being moved into another application. Implementing data quality tools independently has become much simpler in the last few years. Nowadays, you should be thinking enterprise-wide from the beginning. Set up your data quality tool as an enterprise-wide web service that can be invoked from any application.
Data quality as a first step towards MDM, which allows you to start with one application knowing that MDM will be introduced as more applications get into the act. This also comes up quite a bit. Some companies aren’t ready to plunge directly into a large MDM implementation, or they will take on a master data management initiative, but over a fairly long time frame.
Data Quality as a first step towards MDM can be a powerful strategy, because it allows you to “ease into” MDM in manageable phases, and get distinct business value at each stage. “Think big, start small” has become a piece of conventional wisdom in the MDM world, and this is a great way of doing that.
Data quality in parallel with an MDM implement – Data governance and organizational issues must be put front and centre , and new processes designed to manage data through the entire information management life cycle. Only then can you successfully implement the new technology you’ll introduce in a data quality or master data management initiative.
There are many dimensions of data quality that can be addressed as part of a data quality assessment program. Data quality itself can be defined as “fitness for use,” a very broad definition that entails many aspects of quality.
Rather than trying to focus on every dimension, start by focusing on the basics of completeness and timeliness, then move on to validity and consistency. These four dimensions can truly enhance the quality of enterprise data as well as stakeholders’ confidence in the data they consume. These four are basic dimensions that can be expanded upon over time.
Data Quality Dimensions
Completeness is first and foremost necessity for any enterprise data warehouse. Stakeholders need to know that what’s in the source is accounted for in the target. You can ensure completeness in a variety of ways. For example, a record-balancing capability could be developed that records a count at the end of one flow and at the beginning of another to ensure all records are accounted for (number of records in = number of records out). The ultimate goal is to validate that every record and its corresponding information from a source is handled appropriately during processing. This source-to-target validation must be monitored and reported to the organization’s data consumers. The set of results from record balancing is one measurement.
Another could be to compare the summarized data in a quantity field to the summarized amount provided in a control report. Regardless of the approach, it should be one that the governance group approves, and the results should be measured and shared to data consumers as part of the data quality metrics.
Timeliness must be a component of service-level agreements (SLAs) and must identify such criteria as acceptable levels of data latency, frequency of data updates, and data availability. SLAs should be reviewed and approved by the governance group and broadly published to data consumers. Timeliness can then be measured against these defined SLAs and shared as part of the data quality metrics.
Validity is a key data quality measure that indicates the “correctness” of the actual data content; for example, confirming that all the characters in a telephone number field are digits, not alphabetic characters. This is the concept that most data consumers think about when they envision data quality. Validity can be assessed through data profiling, data cleansing, and inline data quality checks.
Data profiling can be used as a starting point for measuring validity. Data profiling is a specific kind of data analysis used to discover information about a particular set of data. The process can uncover potential issues and provide valuable insight into your data. It can summarize details about large data sets from different angles. To support the concept of validity, data profiling includes the inspection of data content through column profiling or value distributions. Data profiling is often used at the beginning of a data project. However, periodic re-profiling of source data can also be useful.
Data cleansing can also be used to address data validity. Data cleansing may include identity resolution, de-duplication, and name-and-address standardization. This process is usually developed as part of the source-to-target ETL processing for a data warehouse. A thorough data quality program includes a mechanism that provides feedback to the source of the data. To ensure a truly robust data quality program, inline data quality checks should be developed. Inline data quality monitoring entails on-going measurement of data as it passes through the ETL (extract, transform, load) processes that prepare the data to be loaded into the target data warehouse. The checks can be:
- Comparisons between incoming values and expected, valid values
- Comparisons of incoming data values to values defined within a stated range
- Validity checks based on specific algorithms
Inline data quality checks should be developed incrementally. One or two key attributes should be identified and implemented as top priority by the governance group. This is where the future-state vision and business conceptual model can assist. The inline data quality checks should be developed as reusable modules and expanded over time.
For example, the governance group can identify two key attributes to monitor. One may compare valid values to expected values; the other may check that a field’s value is within a defined range. Once these checks are in place and being actively monitored, the governance group may identify two additional key attributes to monitor. The initial two modules should be developed so they can be reused and implemented quickly for the two additional checks.
Over time, additional validity checks can be developed (using business rules, for example, or comparing multiple attributes within the rule). These checks should also be developed to be reusable. Regardless of the type of inline data quality check, the measures should be recorded at the earliest possible point in the data flow to ensure that major issues can be caught and addressed before processing is complete. You can set alerts depending on the validity checks used; for example, e-mail messages can be sent to an operations team for follow-up assessment. Likewise, certain identified events may even cause an ETL process to stop. In one situation many years ago, our inline data quality monitoring process found an issue and automatically stopped an ETL process because the number of defaults for a particular key business attribute was much larger than expected. This was an indicator of an issue with a new data source we were processing and loading into the warehouse. We were able to identify the root cause and fix the issue before data was actually loaded into the data warehouse. As a result, we avoided several weeks of re-work, not to mention the ensuing downgrade in stakeholder confidence in the data that this issue could have caused. The results of the inline data quality checks should be measured and shared as part of the data quality metrics.
Consistency is crucial to continued consumer confidence. Once data quality metrics are being monitored and reported to the business stakeholders for completeness, timeliness, and validity, then consistency can be measured by assessing changes in these patterns over time.
One way to do this is to track changes in the completeness, timeliness, and validity assessments, and to identify overall quality trends. These results should be added to the data quality metrics reporting that is shared with business stakeholders.
In a prior data warehouse project, our team measured and reported several data quality metrics and trends. Over time, we were able to show the continuing improvement in the quality of the data we were measuring. This, of course, improved business stakeholder confidence in the quality of our data.
Data Quality Metrics Lead to Data Quality
Confidence Complete transparency of data quality metrics and reporting to your organization’s data consumers will lead to greater confidence in the quality of the underlying data. Often, data consumers hear of a data quality issue and exaggerate that into general negativity about the quality of the data as a whole. I knew a data consumer who used to say in every meeting that the data was “unusable.” As we provided him with the actual data quality metrics, he eventually realized that there were a few issues that needed to be addressed but that the overall quality of the data was acceptable. We were able to counteract his comments and eventually change his beliefs with facts.
In addition to supporting confidence in the quality of data, metrics can support other goals. One goal is to monitor the quality of data to ensure that it continues to meet expectations. Another goal is to ensure that changes in the data that might indicate a data quality issue are detected as early as possible so we can quickly assess and address them as appropriate. A third goal is to proactively identify opportunities for improvement that can be presented to the governance group and prioritized. Stakeholder confidence will continue to increase if you are able to proactively identify issues before the data consumers find them. This is one of the greatest achievements of a robust data quality program.
Data Governance organizations are business entities that define and manage the most vital corporate asset, business information. Governance organizations may vary in participation and influence, but they share common goals of corporate data policy definition, policy enforcement, and communication. DG initiatives arise from self-awareness amongst the business leadership that they create and own information and that IT serves as its librarian. Throughout the business are pockets of information, some self-contained within a single business process and some shared across many.
Business Ownership IS Essential
There’s a natural tendency on the part of business people to assume that because master data resides in computers and databases and sounds technical, that it must be IT’s responsibility to take care of it.
But nothing could be further from the truth. The reality is that while IT people are certainly involved, they’ll never be close enough to the data to be able to tell a prospect from a customer, or a good address from a bad address.
Not only that, but if IT allows business to shift the ownership of data governance to IT, then IT will own it forever. Some people in IT might not think that’s a bad thing, but one of the problems with that is that the business will then see data governance as “Someone Else’s Problem” (SEP). This can make it very hard to get the recurring funding needed for data governance.
On the other hand, when business is fully engaged, is out front leading the data governance initiative and feels a real sense of ownership, IT can still be heavily involved in a supporting role, but with the business leading the charge, funding is usually not a problem, resources are made available when needed, and most of the issues that typically plague data governance programs are less of a problem.
This model, with business leading and IT supporting, is much healthier, but it requires a good working relationship between business and IT, and may require some remedial “business / IT alignment” work to rebuild the partnership between the business and IT. The reality is that it’s going to take a tightly integrated team of business and IT people to build and manage the MDM “stack” and the data governance organization that will be using it.
The Business Decides What Data Quality Means
The business data stewards will handle the data governance, but they’ll need a lot of support from IT. There will be various user interfaces and workbenches to carry out the business decisions the data stewards make.
They’ll interact with workflow queues, data quality front ends, business intelligence scorecards, rules engines, and so on. The average data steward will make the “power user” of the past look like an amateur. But in the end, it’s the knowledge of the business that makes the data stewards valuable.
And that’s why most companies fill these roles internally. They want people from the Order Management department, or from Customer Service, Sales Operations, or Finance. The company needs people with intimate knowledge of the industry and the organization and its customers, products, suppliers, employees, locations, geographies, etc.
Many of these business data stewards will not be full time dedicated positions. They will be people who are doing this job already as part of their existing responsibility, but that arrangement is now being made more formal, and their past hard work is being recognized. Hopefully, their attention to detail and “getting the data right” will now become part of their annual compensation and bonus plan, in recognition for their taking on formal data stewardship responsibilities.
The Data Stewardship Teams in various functional areas and geographies around the business typically report up to a Data Governance Office, which typically does have dedicated personnel.
This Data Governance Office, which functions similarly to a Program Management Office that coordinates multiple ongoing programs and projects in an IT context, provides leadership and coordination to the different Data Stewardship Teams. It also creates data policies, and includes global process owners from the business, as well as a global solution owner from IT. It runs the overall data governance program, coordinates with other groups like Legal and Enterprise Architecture, and provides a single point of contact for all data governance related matters.
Finally, the Data Governance Office usually reports up to an executive group of stakeholders, which are often referred to as a Data Governance Council or Executive Steering Committee. This group sets the overall priorities and funding and provides oversight for the entire data governance effort, as well as making some high level policy itself and resolving issues that are escalated to it because they couldn’t be resolved at the lower levels. Sometimes this function is taken up by an existing body, and sometimes it’s a new group that is pulled together by the executive sponsor who’s responsible for forming data governance in the first place.
IT Handles the Technology
There’s a lot of cool technology needed to support master data management and data governance:
- The MDM hub itself, providing metadata and hierarchy management as well as audit features
- A data integration tool using a service bus and other service-oriented architecture (SOA)
- A data quality tool providing data profiling, entity resolution and data standardization capabilities
- Security and authentication, perhaps including single sign-on and enterprise identity management
- Analytics, including data warehousing, business intelligence and business activity monitoring
Another, most data governance efforts, although they are ultimately business-led are conceived within the IT organization. So there’s the pride of authorship – of bringing something new into the company.
It requires a delicate hand – because IT has to learn how to suggest and encourage without controlling. The business has to feel like it is actually running the data governance organization, even though IT had a big part in getting it off the ground.
I call this “leading from the back of the room”, and it can be difficult. But it’s a form of leadership that IT can learn, like being a power of example to a younger sibling. The business people who are part of the data governance organization will need mentors, and handled correctly, it will lead to a closer business/IT working relationship.
IT has to avoid the temptation to take over the Data Governance Office, or to allow the business to shift responsibility for data governance back to IT. That’s why getting agreement up front on a data governance mission statement and charter is so important, and getting the initial design for data governance signed off by the Executive Steering Committee can prevent setbacks when things get tough.
MDM should be considered an extension of DG. Without proper controls and standardization of data, the worst case for an MDM project is that is becomes a waste of budget. The best case under the same lack of vision is that the project becomes a waste of potential. Strong DG methods are undeniably needed when defining and standardizing information within an MDM solution. Without an Enterprise-wide focus on DG, an MDM solution will eventually arrive at a solution that meets the myopic needs of its immediate source/target systems, but little else. When an MDM expansion opportunity arises, the original lack of global vision will result in either a re-evaluation of the entire MDM solution or a limiting of the new audience to the initial design goals.
When considering a new or expanded MDM initiative, the first step should focus on your DG program. DG Stewards and Sponsors are the driving force for MDM justification and definition and are the true customer for MDM. Defined global controls should be finalized and introduced early into the process. Only then will you be considered a citizen of Information Utopia.
Baum, D., (n.d) ‘Masters of the Data’ Oracle. [Online]. Available from: http://www.oracle.com/us/c-central/cio-solutions/information-matters/importance-of-data/index.html [Accessed 17th September 2016].
Couture, N., (n.d.) ‘Implementing an enterprise Data Quality Strategy’. Business Intelligence Journal, vol. 18, no. 4.
David L., (n.d) ‘Data Governance for Master Data Management and Beyond’ SAS The power to know. [Online]. Available from: http://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/data-governance-for-MDM-and-beyond-105979.pdf [Accessed 15th September 2016].
Hub Designs Magazine (2011) ‘Where Data Governance Stops and Master Data Management Starts’. [Online]. Available from: https://hubdesignsmagazine.com/2011/10/13/where-data-governance-stops-and-master-data-management-starts/ [Accessed 15th September 2016].