What is Master Data Management | Definition, Tools, Solutions [Updated 2023]

Master data management (MDM) arose out of the necessity for businesses to improve the consistency and quality of their key data assets, such as product data, asset data, customer data, location data, etc.
Master Data Management: What, Why, How & Who

Updated: October 30, 2023

According to a recent Dataversity article, analysts predict that digitized businesses will stand out through their enterprise data management and data governance strategies in 2023. Many businesses today, especially global enterprises have hundreds of separate applications and systems (ie ERP, CRM) where data that crosses organizational departments or divisions can easily become fragmented, duplicated and most commonly out of date. When this occurs, answering even the most basic, but critical questions about any type of performance metric or KPI for a business accurately becomes a pain.

Getting answers to basic questions such as “who are our most profitable customers?”, “what product(s) have the best margins?” or in some cases, “how many employees do we have”? become tough to answer – or at least with any degree of accuracy.

Basically, the need for accurate, timely information is acute and as sources of data increase, managing it consistently and keeping data definitions up to date so all parts of a business use the same information is a never ending challenge.

To meet this challenges, businesses turn to master data management (MDM).

Download your exclusive copy of the guide to keep in your back pocket. Or if you’re ready to dive in, continue your journey below.

What you’ll learn from this article:

This article explains what MDM is, why it is important, how to manage it and who should be involved, while identifying some key MDM management patterns and best practices. Specifically, it covers:

Thumbs-up icon

Let’s get started!

What is Master Data?

Most software systems have lists of data that are shared and used by several of the applications that make up the system.

For example: A typical ERP system will have at the very least Customer Master, Item Master and Account Master data lists. This master data is often one of the key assets of a company. In fact, it’s not unusual for a company to be acquired primarily for access to its Customer Master data.

Rudimentary Master Data Definition

One of the most important steps in understanding master data is getting to know the terminology. To start, there are some very well understood and easily identified master data items, such as “customer” and “product.” Truth be told, many define master data simply by reciting a commonly agreed upon master data item list, such as: Customer, Product, Location, Employee and Asset.

But how you identify elements of data that should be managed by a MDM software is much more complex and defies such rudimentary definitions. And that has created a lot of confusion around what master data is and how it is qualified.

To give a more comprehensive answer to the question of “what is master data?”, we can look at the 6 types of data typically found in corporations:

  1. Unstructured Data: Data found in email, white papers, magazine articles, corporate intranet portals, product specifications, marketing collateral and PDF files.
  2. Transactional Data: Data about business events (often related to system transactions, such as sales, deliveries, invoices, trouble tickets, claims and other monetary and non-monetary interactions) that have historical significance or are needed for analysis by other systems. Transactional data are unit level transactions that use master data entities. Unlike master data, transactions are inherently temporal and instantaneous by nature.
  3. Metadata: Data about other data. It may reside in a formal repository or in various other forms, such as XML documents, report definitions, column descriptions in a database, log files, connections and configuration files.
  4. Hierarchical Data: Data that stores the relationships between other data. It may be stored as part of an accounting system or separately as descriptions of real world relationships, such as company organizational structures or product lines. Hierarchical data is sometimes considered a super MDM domain because it is critical to understanding and sometimes discovering the relationships between master data.
  5. Reference Data: A special type of master data used to categorize other data or used to relate data to information beyond the boundaries of the enterprise. Reference data can be shared across master or transactional data objects (e.g. countries, currencies, time zones, payment terms, etc.)
  6. Master Data: The core data within the enterprise that describes objects around which business is conducted. It typically changes infrequently and can include reference data that is necessary to operate the business. Master data is not transactional in nature, but it does describe transactions. The critical nouns of a business that master data covers generally fall into four domains and further categorizations within those domains are called subject areas, sub-domains or entity types.

The four general master data domains are:


Within the customer’s domain, there are customer, employee and salesperson sub-domains.


Within products domain, there are product, part, store and asset sub-domains.


Within the locations domain, there are office location and geographic division sub-domains.


Within the other domain, there are things like contract, warranty and license sub-domains.

Some of these sub-domains may be further divided. For instance, customer may be further segmented based on incentives and history, since your company may have normal customers as well as premiere and executive customers. Meanwhile, product may be further segmented by sector and industry. This level of granularity is helpful because requirements, lifecycle and CRUD cycle for a product in the Consumer Packaged Goods (CPG) sector is likely very different from those for products in the clothing industry. The granularity of domains is essentially determined by the magnitude of differences between the attributes of the entities within them.

Building a Master Data Management Strategy

A master data management (MDM) strategy takes into account the core data types, or domains, that have the greatest business impact.

While identifying master data entities is pretty straightforward, not all data that fits the definition for master data should necessarily be managed as such. In general, master data is typically a small portion of all of your data from a volume perspective, but it’s some of the most complex data and the most valuable to maintain and manage.

So, what data should you manage as master data?

We recommend using the following criteria, all of which should be considered together when deciding if a given entity should be treated as master data.

Behavior Data

Master data can be described by the way that it interacts with other data.

For example:

In transaction systems, master data is almost always involved with transactional data. A customer buys a product, a vendor sells a part and a partner delivers a crate of materials to a location. An employee is hierarchically related to their manager, who reports up through a manager (another employee). A product may be a part of multiple hierarchies describing its placement within a store.

This relationship between master data and transactional data may be fundamentally viewed as a noun/verb relationship. Transactional data captures the verbs, such as sale, delivery, purchase, email and revocation, while master data captures the nouns. This is the same relationship data warehouse facts and dimensions share.

Lifecycle (CRUD Cycle)

Master data can be described by the way that it is created, read, updated, deleted and searched. This lifecycle is called the CRUD cycle and is different for various master data element types and companies.

For example:

How a customer is created depends largely upon a company’s business rules, industry segment and data systems. One company may have multiple customer creation vectors, such as through the Internet, directly through account representatives or through outlet stores. Another company may only allow customers to be created through direct contact over the phone with its call center. Further, how a customer element gets created is certainly different from how a vendor element gets created.

The following table illustrates the differing CRUD cycles for four common master data subject areas.

CreateA customer visit, such as to the company website or a facility triggers account creationA product gets purchased or manufactured with SCM involvementA unit gets acquired by opening a PO following the necessary approval processHR hires a new employee, who must then fill out numerous forms, attend orientation, make benefits selections, determine asset allocations and follow office assignments
ReadContextualized views based on credentials of viewerPeriodic inventory cataloguesPeriodic reporting purposes, figuring depreciation, verificationOffice access, reviews, insurance-claims, immigration
UpdateAddress, discounts, phone number, preferences, credit accountsPackaging changes, raw materials changesTransfers, maintenance, accident reportsImmigration status, marriage status, level increase, raises, transfers
DestroyDeath, bankruptcy, liquidation, do-not-callCanceled, replaced, no longer availableObsolete, sold, destroyed, stolen, scrappedTermination, death
SearchCRM system, call center system, contact management systemERP system, orders processing systemGL tracking, asset DB managementHR LOB system


As cardinality (the number of elements in a set) decreases, the likelihood of an element being treated as a master data element—even a commonly accepted subject area, such as customer—decreases.

For example:

If a company has only three customers, most likely the organization would not consider those customers master data—at least, not in the context of supporting them with a MDM solution, simply because there is no benefit to managing those customers with a master data infrastructure. In contrast, a company with thousands of customers would consider customer an important subject area because of the concomitant issues and benefits around managing such a large set of entities.

The customer value to each of these companies is the same, as both rely on their customers for business. However, one does not need a customer master data solution and the other does. Cardinality does not change the classification of a given entity type; however, the importance of having a solution for managing an entity type increases as the cardinality of the entity type increases.


Master data tends to be less volatile than transactional data. As it becomes more volatile, it is typically considered more transactional.

For example:

Some might consider “contract” a master data element. Others might consider it a transaction. Depending on the lifespan of a contract, it can go either way.

An agency promoting professional athletes might consider their contracts master data. In this case, each is different from the other and typically has a lifetime of greater than a year. It may be tempting to simply have one master data item called “athlete.” However, athletes tend to have more than one contract at any given time: One with their teams and others with companies for product endorsements. The agency would need to manage all those contracts over time as elements of each contract get renegotiated or as athletes get traded.

Other contracts—for example, contracts for detailing cars or painting a house—are more like a transaction. They are one-time, short-lived agreements to provide services for payment and are typically fulfilled and destroyed within hours.


Simple entities, even if they are valuable entities, are rarely a challenge to manage and are rarely considered master data elements. The less complex an element, the less likely the need to manage change for that element. Typically, such assets are simply collected and tallied.

For example:

Fort Knox likely would not track information on each individual gold bar it stores, but rather only keep a count of them. The value of each gold bar is substantial, the cardinality high and the lifespan long, but the complexity is low.


The more valuable the data element is to the company, the more likely it will be considered a master data element. Value and complexity work together.


While master data is typically less volatile than transactional data, entities with attributes that do not change at all typically do not require a master data solution.

For example:

Rare coins would seem to meet many of the criteria for a master data treatment. A rare coin collector would likely have many rare coins, so cardinality is high. They are also valuable and complex since they have a history and description (e.g. attributes such as condition of obverse, reverse, legend, inscription, rim and field as well as designer initials, edge design, layers and portrait).

Despite all of these conditions, rare coins do not need to be managed as a master data item because they don’t change over time—or, at least, they don’t change enough. There may need to be more information added as the history of a particular coin is revealed or if certain attributes must be corrected, but, generally speaking, rare coins would not be managed through a master data management system because they are not volatile enough to warrant it.


One of the primary drivers of master data management is reuse.

For example:

In a simple world, the CRM system would manage everything about a customer and never need to share any information about the customer with other systems. However, in today’s complex environments, customer information needs to be shared across multiple applications. That’s where the trouble begins.

Because—for a number of reasons—access to a master datum is not always available, people start storing master data in various locations, such as spreadsheets and application private stores. There are still reasons, such as data quality degradation and decay, to manage master data that is not reused across the enterprise. However, if a master data entity is reused in multiple systems, it’s a sure bet that it should be managed with a MDM software.

In Summary…

While it is simple to enumerate the various master data entity types, it is sometimes more challenging to decide which data items in a company should be treated as master data.

Often, data that does not normally comply with the definition for master data may need to be managed as such and data that does comply with the definition may not.

Ultimately, when deciding on what entity types should be treated as master data, it is better to categorize them in terms of their behavior and attributes within the context of the business needs than to rely on simple lists of entity types.

Why Bother With Managing Master Data?

Because master data is used by multiple applications, an error in the data in one place can cause errors in all the applications that use it.

For example:

An incorrect address in the customer master might mean orders, bills and marketing literature are all sent to the wrong address. Similarly, an incorrect price on an item master can be a marketing disaster and an incorrect account number in an account master can lead to huge fines or even jail time for the CEO—a career-limiting move for the person who made the mistake.

How Does Master Data Management Drive Digital Transformation?

The key to driving an organization’s digital transformation lies in intelligent and automated data management. Whether it is embarking on cloud modernization, reimagining the customer experience through a comprehensive and unified view of data across the business, or implementing enterprise data governance and privacy, effectively managing data plays a crucial role in achieving a successful digital transformation.

Answers to Common Master Data Management Questions

How can data management drive operational efficiency with simplified workflows?

Data management drives operational efficiency by simplifying workflows. By centralizing and streamlining data processes, organizations can eliminate redundancies, reduce manual tasks, and automate data-related workflows. This leads to improved efficiency, reduced errors, and increased productivity across the organization.

How can data management increase agility with 360-degree views of data across the enterprise?

Data management increases agility by providing 360-degree views of data across the enterprise. This means that organizations have a comprehensive and unified view of their data from various sources, enabling them to make faster and more informed decisions, respond quickly to changes, and adapt to evolving business needs.

How can data management boost revenue and profitability with more accurate AI models?

Data management can boost revenue and profitability by providing more accurate AI models. By ensuring that data is accurate, reliable, and up-to-date, organizations can train AI models with high-quality data, leading to more accurate predictions and insights that can drive revenue growth and improve profitability.

How can data management enhance workforce productivity?

Data management enhances workforce productivity by enabling self-service data access. This means that employees can easily access the data they need without relying on IT or data specialists, allowing them to work more efficiently and make informed decisions.

What are the business-critical benefits of data management?

Data management provides several business-critical benefits, including enhancing workforce productivity through self-service data access, boosting revenue and profitability with more accurate AI models, increasing agility with 360-degree views of data across the enterprise, driving operational efficiency with simplified workflows, and increasing access to data on any platform, any cloud, and for any type of user in multicloud and multi-hybrid environments.

Real Life Master Data Example: Why You Need Master Data

This is the heading

A Typical Master Data Horror Story

A credit card customer moves from 2847 North 9th St. to 1001 11th St. North. The customer changed his billing address immediately but did not receive a bill for several months. One day, the customer received a threatening phone call from the credit card billing department asking why the bill has not been paid. The customer verifies that they have the new address and the billing department verifies that the address on file is 1001 11th St. North. The customer asks for a copy of the bill to settle the account.

After two more weeks without a bill, the customer calls back and finds the account has been turned over to a collection agency. This time, the customer finds out that even though the address in the file was 1001 11th St. North, the billing address is listed as 101 11th St. North. After several phone calls and letters between lawyers, the bill finally gets resolved and the credit card company has lost a customer for life.

In this case, the master copy of the data was accurate, but another copy of it was flawed. Master data must be both correct and consistent. Even if the master data has no errors, few organizations have just one set of master data. Many companies grow through mergers and acquisitions, and each company that the parent organization acquires comes with its own customer master, item master and so forth.

This would not be bad if you could just union the new master data with the current master data, but unless the company acquired is in a completely different business in a faraway country, there’s a very good chance that some customers and products will appear in both sets of master data—usually with different formats and different database keys.

If both companies use the Dun & Bradstreet Number or Social Security Number as the customer identifier, discovering which customer records are for the same customer is a straightforward issue; but that seldom happens. In most cases, customer numbers and part numbers are assigned by the software that creates the master records, so the chances of the same customer or the same product having the same identifier in both databases is pretty remote. Item masters can be even harder to reconcile if equivalent parts are purchased from different vendors with different vendor numbers.

In Summary…

Merging master lists together can be very difficult since the same customer may have different names, customer numbers, addresses and phone numbers in different databases. For example, William Smith might appear as Bill Smith, Wm. Smith and William Smithe. Normal database joins and searches will not be able to resolve these differences.

A very sophisticated tool that understands nicknames, alternate spellings and typing errors will be required. The tool will probably also have to recognize that different name variations can be resolved if they all live at the same address or have the same phone number.

The Benefits of Creating a Common Master Data List

While creating a clean master list can be a daunting challenge, there are many positive benefits to the bottom line that come from having a common master list, including:

  • A single, consolidated bill, which saves money and improves customer satisfaction
  • No concerns about sending the same marketing literature to a customer from multiple customer lists, which wastes money and irritates the customer
  • A cohesive view of customers across the organization, that way users know before they turn a customer account over to a collection agency whether or not that customer owes money to other parts of the organization or, more importantly, if that customer is another division’s biggest source of business
  • A consolidated view of items to eliminate wasted money and shelf space as well as the risk of artificial shortages that come from stocking the same item under different part numbers

Finally, the movement toward SOA and SaaS make MDM a critical issue.

For example:

If you create a single customer service that communicates through well-defined XML messages, you may think you have defined a single view of your customers. But if the same customer is stored in five databases with three different addresses and four different phone numbers, what will your customer service return?

Similarly, if you decide to subscribe to a CRM service provided through SaaS, the service provider will need a list of customers for its database. Which list will you send?

For all of these reasons, maintaining a high quality, consistent set of master data for your organization is rapidly becoming a necessity. The systems and processes required to maintain this data are known as Master Data Management.

What is Master Data Management?

Master Data Management (MDM) is the technology, tools and processes that ensure master data is coordinated across the enterprise. MDM provides a unified master data service that provides accurate, consistent and complete master data across the enterprise and to business partners.

There are a couple things worth noting in this definition:

  1. MDM is not just a technological problem. In many cases, fundamental changes to business process will be required to maintain clean master data and some of the most difficult MDM issues are more political than technical.
  2. MDM includes both creating and maintaining master data. Investing a lot of time, money and effort in creating a clean, consistent set of master data is a wasted effort unless the solution includes tools and processes to keep the master data clean and consistent as it gets updated and expands over time.

Depending on the technology used, MDM may cover a single domain (customers, products, locations or other) or multiple domains. The benefits of multi-domain MDM include a consistent data stewardship experience, a minimized technology footprint, the ability to share reference data across domains, a lower total cost of ownership and a higher return on investment.

The 6 Disciplines of a Strong MDM Program

Given that MDM is not just a technological problem, meaning you can’t just install a piece of technology and have everything sorted out, what does a strong MDM program entail?

Before you get started with a master data management program, your MDM strategy should be built around these 6 disciplines:

  1. Governance: Directives that manage the organizational bodies, policies, principles and qualities to promote access to accurate and certified master data. Essentially, this is the process through which a cross-functional team defines the various aspects of the MDM program.
  2. Measurement: How are you doing based on your stated goals? Measurement should look at data quality and continuous improvement.
  3. Organization: Getting the right people in place throughout the MDM program, including master data owners, data stewards and those participating in governance.
  4. Policy: The requirements, policies and standards to which the MDM program should adhere.
  5. Process: Defined processes across the data lifecycle used to manage master data.
  6. Technology: The master data hub and any enabling technology.

Getting Started With Your MDM Program

Once you secure buy-in for your MDM program, it’s time to get started. While MDM is most effective when applied to all the master data in an organization, in many cases the risk and expense of an enterprise-wide effort are difficult to justify.

PRO TIP: It is often easier to start with a few key sources of master data and expand the effort once success has been demonstrated and lessons have been learned.

If you do start small, you should include an analysis of all the master data that you might eventually want to include in your program so that you do not make design decisions or tool choices that will force you to start over when you try to incorporate a new data source. For example, if you’re initial customer master implementation only includes the 10,000 customers your direct sales force deals with, you don’t want to make design decisions that will preclude adding your 10,000,000 web customers later.

Your MDM project plan will be influenced by requirements, priorities, resource availability, time frame and the size of the problem. Most MDM projects include at least these phases:

This step is usually a very revealing exercise. Some companies find they have dozens of databases containing customer data that the IT department did not know existed.

This step involves pinpointing which applications produce the master data identified in the first step, and—generally more difficult to determine—which applications use the master data. Depending on the approach you use for maintaining the master data, this step might not be necessary. For example, if all changes are detected and handled at the database level, it probably does not matter where the changes come from.

For all the sources identified in step one, what are the entities and attributes of the data and what do they mean? This should include:

  • Attribute name
  • Data type
  • Allowed values
  • Constraints
  • Default values
  • Dependencies
  • Who owns the definition and maintenance of the data

‘Owner’ is the most important and often the hardest to determine. If you have a repository loaded with all your metadata, this step is an easy one. If you have to start from database tables and source code, this could be a significant effort.

These should be the people with the knowledge of the current source data and the ability to determine how to transform the source data into the master data format. In general, stewards should be appointed by the owners of each master data source, the architects responsible for the MDM softwares and representatives from the business users of the master data.

This group must have the knowledge and authority to make decisions on how the master data is maintained, what it contains, how long it is kept and how changes are authorized and audited. Hundreds of decisions must be made in the course of a master data project, and if there is not a well-defined decision-making body and process, the project can fail because politics prevent effective decision-making.

Decide what the master records look like, including what attributes are included, what size and data type they are, what values are allowed and so forth. This step should also include the mapping between the master data model and the current data sources. This is normally both the most important and most difficult step in the process. If you try to make everybody happy by including all the source attributes in the master entity, you often end up with master data that is too complex and cumbersome to be useful.

For example:

If you cannot decide whether weight should be in pounds or kilograms, one approach would be to include both (WeightLb and WeightKg). While this might make people happy, you are wasting megabytes of storage for numbers that can be calculated in microseconds and running the risk of creating inconsistent data (WeightLb = 5 and WeightKg = 5). While this is a pretty trivial example, a bigger issue would be maintaining multiple part numbers for the same part.

As in any committee effort, there will be fights and deals resulting in suboptimal decisions. It’s important to work out the decision process, priorities and final decision-maker in advance to make sure things run smoothly.

You will need to buy or build tools to create the master lists by cleaning, transforming and merging the source data. You will also need an infrastructure to use and maintain the master list. These functions are covered in detail later in this article. You can use a single toolset from a single vendor for all of these functions or you might want to take a best-of-breed approach. In general, the techniques to clean and merge data are different for different types of data, so there are not a lot of tools that span the whole range of master data. The two main categories of tools are Customer Data Integration (CDI) tools for creating the customer master and Product Information Management (PIM) tools for creating the product master. Some tools will do both, but generally tools are better at one or the other. The toolset should also have support for finding and fixing data quality issues and maintaining versions and hierarchies. Versioning is a critical feature because understanding the history of a master data record is vital to maintaining its quality and accuracy over time.

For example:

If a merge tool combines two records for John Smith in Boston and you decide there really are two different John Smiths in Boston, you need to know what the records looked like before they were merged in order to “unmerge” them.

Looking at the big picture, functional capabilities for which to look include data modeling, integration, data matching, data quality, data stewardship, hierarchy management, workflow and data governance. From a non-functional perspective, you should also consider scalability, availability and performance.

Once you have clean, consistent master data, you will need to expose it to your applications and provide processes to manage and maintain it. When this infrastructure is implemented, you will have a number of applications that will depend on it being available, so reliability and scalability are important considerations to include in your design. In most cases, you will have to implement significant parts of the infrastructure yourself because it will be designed to fit into your current infrastructure, platforms and applications.

This step is where you use the tools you have developed or purchased to merge your source data into your master data list. This is often an iterative process that requires tinkering with rules and settings to get the matching right. This process also requires a lot of manual inspection to ensure that the results are correct and meet the requirements established for the project.

No tool will get the matching done correctly 100 percent of the time, so you will have to weigh the consequences of false matches versus missed matches to determine how to configure the matching tools. False matches can lead to customer dissatisfaction if bills are inaccurate or the wrong person is arrested. Too many missed matches make the master data less useful because you are not getting the benefits you invested in MDM to get.

Depending on how your MDM implementation is designed, you might have to change the systems that produce, maintain or consume master data to work with the new source of master data. If the master data is used in a system separate from the source systems—a data warehouse, for example—the source systems might not have to change.

If the source systems are going to use the master data, however, there will likely be changes required. Either the source systems will have to access the new master data or the master data will have to be synchronized with the source systems so that the source systems have a copy of the cleaned-up master data to use. If it’s not possible to change one or more of the source systems, either that source system might not be able to use the master data or the master data will have to be integrated with the source system’s database through external processes, such as triggers and SQL commands.

The source systems generating new records should be changed to look up existing master record sets before creating new records or updating existing master records. This ensures that the quality of data being generated upstream is good so that the MDM can function more efficiently and the application itself manages data quality. MDM should be leveraged not only as a system of record, but also as an application that promotes cleaner and more efficient handling of data across all applications in the enterprise.

As part of your MDM strategy, you need to look into all three pillars of data management:

  • Data origination
  • Data management
  • Data consumption

It is not possible to have a robust, enterprise-level MDM strategy if any one of these aspects is ignored.

As stated earlier, any MDM implementation must incorporate tools, processes and people to maintain the quality of the data. All data must have a data steward who is responsible for ensuring the quality of the master data.

The data steward is normally a business person who has knowledge of the data, can recognize incorrect data and has the knowledge and authority to correct the issues. The MDM infrastructure should include tools that help the data steward recognize issues and simplify corrections. A good data stewardship tool should point out questionable matches that were made—customers with different names and customer numbers that live at the same address, for example.

The steward might also want to review items that were added as new because the match criteria were close but below the threshold. It is important for the data steward to see the history of changes made to the data by the MDM software in order to isolate the source of errors and undo incorrect changes. Maintenance also includes the processes to pull changes and additions into the MDM software and to distribute the cleansed data to the required places.

As you can see, MDM is a complex process that can go on for a long time. Like most things in software, the key to success is to implement MDM incrementally so that the business realizes a series of short-term benefits while the complete project is a long-term process.

Additionally, no MDM project can be successful without the support and participation of the business users. IT professionals do not have the domain knowledge to create and maintain high-quality master data. Any MDM project that does not include changes to the processes that create, maintain and validate master data is likely to fail.

The rest of this article will cover the details of the technology and processes for creating and maintaining master data.

How Do You Create a Master List?

Whether you buy a MDM tool or decide to build your own, there are two basic steps to creating master data:

  1. Cleaning and standardizing the data
  2. Matching data from all the sources to consolidate duplicates.

Cleaning and Standardizing Master Data

Before you can start cleaning and normalizing your data, you must understand the data model for the master data. As part of the modeling process, you should have defined the contents of each attribute and defined a mapping from each source system to the master data model. Now, you can use this information to define the transformations necessary to clean your source data.

Cleaning the data and transforming it into the master data model is very similar to the Extract, Transform and Load (ETL) processes used to populate a data warehouse. If you already have ETL tools and transformation defined, it might be easier just to modify these as required for the master data instead of learning a new tool. Here are some typical data cleansing functions:

  • Normalize data formats: Make all the phone numbers look the same, transform addresses and so on to a common format.
  • Replace missing values: Insert defaults, look up ZIP codes from the address, look up the Dun & Bradstreet Number.
  • Standardize values: Convert all measurements to metric, convert prices to a common currency, change part numbers to an industry standard.
  • Map attributes: Parse the first name and last name out of a contact name field, move Part# and partno to the PartNumber field.

Most tools will cleanse the data that they can and put the rest into an error table for hand processing. Depending on how the matching tool works, the cleansed data will be put into a master table or a series of staging tables. As each source gets cleansed, you should examine the output to ensure the cleansing process is working correctly.

Matching Data to Eliminate Duplicates

Matching master data records to eliminate duplicates is both the hardest and most important step in creating master data. False matches can actually lose data (two Acme Corporations become one, for example) and missed matches reduce the value of maintaining a common list.

As a result, the matching accuracy of MDM tools is one of the most important purchase criteria.

Some matches are pretty trivial to do. If you have Social Security Numbers for all your customers or if all your products use a common numbering scheme, a database JOIN will find most of the matches. This hardly ever happens in the real world, however, so matching algorithms are normally very complex and sophisticated. Customers can be matched on name, maiden name, nickname, address, phone number, credit card number and so on, while products are matched on name, description, part number, specifications and price.

PRO TIP: The more attribute matches and the closer the match, the higher degree of confidence the MDM software has in the match.

This confidence factor is computed for each match, and if it surpasses a threshold, the records match. The threshold is normally adjusted depending on the consequences of a false match.

For example:

You might specify that if the confidence level is over 95 percent, the records are merged automatically, and if the confidence level is between 80 percent and 95 percent, a data steward should approve the match before they are merged.

How Should You Merge Your Data?

Most merge tools merge one set of input into the master list, so the best procedure is to start the list with the data in which you have the most confidence and then merge the other sources in one at a time. If you have a lot of data and a lot of problems with it, this process can take a long time.

PRO TIP: You might want to start with the data from which you expect to get the most benefit once it’s consolidated and then run a pilot project with that data to ensure your processes work and that you are seeing the business benefits you expect.

From there, you can start adding other sources as time and resources permit. This approach means your project will take longer and possibly cost more, but the risk is lower. This approach also lets you start with a few organizations and add more as the project demonstrates success instead of trying to get everybody on board from the start.

Another factor to consider when merging your source data into the master list is privacy. When customers become part of the customer master, their information might be visible to any of the applications that have access to the customer master. If the customer data was obtained under a privacy policy that limited its use to a particular application, you might not be able to merge it into the customer master.

Because of implications around privacy, you might want to add a lawyer to your MDM planning team.

At this point, if your goal was to produce a list of master data, you are done. Print it out or burn it to an external hard drive and move on. If you want your master data to stay current as data gets added and changed, you will have to develop infrastructure and processes to manage the master data over time.

The next section provides some options on how to do just that.

How Do You Maintain a Master List?

There are many different tools and techniques for managing and using master data. We will cover three of the more common scenarios here:

  1. Single copy: In this approach, there is only one master copy of the master data. All additions and changes are made directly to the master data. All applications that use master data are rewritten to use the new data instead of their current data. This approach guarantees consistency of the master data, but in most cases it’s not practical. That’s because modifying all your applications to use a new data source with a different schema and different data is, at least, very expensive. If some of your applications are purchased, it might even be impossible.
  2. Multiple copies, single maintenance: In this approach, master data is added or changed in the single master copy of the data, but changes are sent out to the source systems in which copies are stored locally. Each application can update the parts of the data that are not part of the master data, but they cannot change or add master data.

    For example:

    The inventory system might be able to change quantities and locations of parts, but new parts cannot be added and the attributes that are included in the product master cannot be changed. This reduces the number of application changes that will be required, but the applications will minimally have to disable functions that add or update master data. Users will have to learn new applications to add or modify master data and some of the things they normally do will not work anymore.

  3. Continuous merge: In this approach, applications are allowed to change their copy of the master data. Changes made to the source data are sent to the master, where they are merged into the master list. The changes to the master are then sent to the source systems and applied to the local copies. This approach requires few changes to the source systems. If necessary, the change propagation can be handled in the database so no application code is changed.On the surface, this seems like the ideal solution because application changes are minimized and no retraining is required. Everybody keeps doing what they are doing, but with higher quality, more complete data. However, this approach does have several issues:
    • Update conflicts are possible and difficult to reconcile: What happens if two of the source systems change a customer’s address to different values? There’s no way for the MDM software to decide which one to keep, so intervention by the data steward is required. In the meantime, the customer has two different addresses. This must be addressed by creating data governance rules and standard operating procedures to ensure that update conflicts are reduced or eliminated.
    • Additions must be remerged: When a customer is added, there is a chance that another system has already added the customer. To deal with this situation, all data additions must go through the matching process again to prevent new duplicates in the master.
    • Maintaining consistent values is more difficult: If the weight of a product is converted from pounds to kilograms and then back to pounds, rounding can change the original weight. This can be disconcerting to a user who enters a value and then sees it change a few seconds later.

In general, all these things can be planned for and dealt with, making the user’s life a little easier at the expense of a more complicated infrastructure to maintain and more work for the data stewards. This might be an acceptable trade-off, but it’s one that should be made consciously.

A Few Thoughts On Versioning and Auditing

No matter how you manage your master data, it’s important to be able to understand how the data got to the current state.

For example:

If a customer record was consolidated from two different merged records, you might need to know what the original records looked like in case a data steward determines that the records were merged by mistake and should really be two different customers. The version management should include a simple interface for displaying versions and reverting all or part of a change to a previous version.

The normal branching of versions and grouping of changes that source control systems use can also be very useful for maintaining different derivation changes and reverting groups of changes to a previous branch. Data stewardship and compliance requirements will often include a way to determine who made each change and when it was made.

To support these requirements, an MDM software should include a facility for auditing changes to the master data. In addition to keeping an audit log, the MDM software should include a simple way to find the particular change for which you are looking. An MDM software can audit thousands of changes a day, so search and reporting facilities for the audit log are important.

A Few Thoughts On Hierarchy Management

In addition to the master data itself, the MDM software must maintain data hierarchies—for example, bill of materials for products, sales territory structure, organization structure for customers and so forth. It’s important for the MDM software to capture these hierarchies, but it’s also useful for an MDM software to be able to modify the hierarchies independently of the underlying systems.

For example:

When an employee moves to a different cost center, there might be impacts to the Travel and Expense system, payroll, time reporting, reporting structures and performance management. If the MDM software manages hierarchies, a change to the hierarchy in a single place can propagate the change to all the underlying systems.

There might also be reasons to maintain hierarchies in the MDM software that do not exist in the source systems.

For example:

Revenue and expenses might need to be rolled up into territory or organizational structures that do not exist in any single source system. Planning and forecasting might also require temporary hierarchies to calculate “what if” numbers for proposed organizational changes. Historical hierarchies are also required in many cases to roll up financial information into structures that existed in the past, but not in the current structure.

For these reasons, a powerful, flexible hierarchy management feature is an important part of an MDM software.

Who Should Be Involved in Your MDM Program?

Now that you understand the what and why, let’s talk about the who and really, there are a several different ways to think about who to involve in an MDM program. First, let’s take a high-level look at three core roles:

  1. Data Governance: Individuals who drive the definition, requirements and solution. These users help administrators know what to create and data stewards know what to manage and how to manage it. Data governance users dictate to data stewards how data should be managed, including the processes for doing so, and then hold the data stewards accountable to following those requirements. Data governance users also dictate to administrators what to create during the implementation of the MDM solution, especially from a data matching and quality perspective.Data governance users also need to maintain a feedback loop from the MDM software to ensure everything is working as expected. This feedback covers the measurement perspective of the MDM program and might include information like:
    • How long does it take to onboard a new customer?
    • Is that process getting faster or slower?
    • How is the company doing compared to its SLA?
    • If there are any areas that are slipping, why is that happening?
    • How well is the data matching working?
    • How many business rules are failing from a data quality perspective?
  2. Administrators: Individuals in IT who are responsible for setting up and configuring the solution.
  3. Data Stewards: Boots on the ground individuals responsible for fixing, cleaning and managing the data directly within the solution. Ideally, data stewards come from departments across the business, such as finance and marketing. Typically, the activities that data stewards take on within the MDM program are defined by data governance users.

Other MDM roles can include and vary by organization/project type:

Role Skills/Responsibilities Level Of Involvement
Program Manager Owns the data management strategy and platform. Part time
Project Manager Develops and manages project plans, ensures timely quality deliverables and reports project progress. Responsible for risk and issue management and escalation. None
System Admin and DBA Sys Admin: Systems administrators tend to work on things managing things like domains, storage, virtualization, group policies, DNS, some networking, etc. Basically they tend to be more generalized. DBA: DBA combines some skills from system administration along with some from the development world along with specialized knowledge of the database platforms used. Occasional support
Developer Developers implement custom SDK and/or Workflow solutions to extend MDM platforms. This may include web services based integrations, bespoke user interfaces, or custom applications or processes that leverage APIs or MDM data. A developer must have a working knowledge of C#.NET, Windows Communications Framework and ASP.NET. Occasional support
ETL Developer Batch data loading from source systems (ETL integration) is performed by these team members, with Profisee providing training and guidance on how to execute the implementation within the scope. Occasional support
Business Analyst/SME Resources who are familiar with the data and the business processes related to a MDM solution. Provides deep knowledge of application functionality and requirements and participates in workshops, planning and execution of the review and testing activities. Occasional support
Data Architect/Data Modeler Oversees enterprise conceptual, logical, and physical data models that conform to an organization’s standards and conventions; Provides leadership and guidance with enterprise data strategies, especially as they relate to MDM; Assists with organization governance practices, and standards and acts as a liaison between business and IT to clarify data requirements. Occasional support
End Users/Data Stewards Individuals who interact with the master data and/or business processes. These are the business users of the MDM system and act as stewards/maintainers of the data. Up to full time
Governance Council The Master Data Governance Council (MDGC) is the decision-making and policy-making authority for matters related to data. The MDGC oversees the implementation of data standards and quality assurance to ensure that the MDM team and Data Stewards are developing, maintaining, and providing acceptable system data for the use of others. Part time (regular meetings)

Master Data Management Stakeholders:

Aside from the roles that execute and manage an MDM strategy, one of the keys to a successful MDM project is active commitment by the key stakeholders. The stakeholders for a typical MDM engagement include those representing both the business and IT. Active stakeholders usually include, but are not limited to, the following types of roles:

  • Business or IT Executive Sponsor
  • IT Project Lead
  • Subject-matter experts from the impacted Line-of-business
  • Data Stewards
  • IT delivery team

As MDM stakeholders are defined throughout an organization, it is critical to secure their engagement and be committed to their organization’s MDM journey. Through multiple implementations, Profisee has identified several “Health” indicators to help determine the MDM stakeholder impact:


Healthy Signs

  • Executive incentives tied to project results
  • Investments in change management and training
  • Subject matter experts dedicated full-time
  • The right sponsor is appropriately engaged and funded
  • Regular Steering Committee meetings are being held, decisions and actions are being taken in a timely fashion and are effective
  • All appropriate stakeholder groups are effectively represented and engaged

Unhealthy Signs

  • No executive sponsor visible
  • Resistance to new ideas
  • No “experts” available

Master Data Management Steering Committee

It’s recommended that management-level representation from the MDM stakeholders form a Steering Committee to facilitate cross-functional decision-making. Here are a few characteristics of an effective Steering Committee:

  • Be sized appropriately – Big enough to represent the priority stakeholders, but small enough to quickly analyze key information and make decisions.
  • Focused on fast decision-making
  • Become a vehicle for removing organizational barriers and not simply a regular meeting for listening to reporting from the Project Team members
  • Not be a substitute for hands-on Sponsorship

Once the stakeholders are identified, the MDM Project Charter should include formation of a Steering Committee. Based on running hundreds or MDM projects, Profisee recommends the following roles participate in the Steering Committee. Note that there may be more than one team member per role, or some roles may not be applicable or a company’s organizational structure.

Role Description
Executive Sponsor(s) Primary budget owner for MDM Initiative. This role typically comes from the line of business expected to benefit from the MDM solution.
Data Governance Lead MDM is a component of a larger Data Governance strategy. If the organization has a Data Governance team in place, it should be an active participant in an MDM Steering Committee.
Data Steward or SME The team responsible for day-to-day data management, including making decisions about how data is presented in operational or analytical systems, is typically part of the Steering Committee.
IT Sponsor(s) MDM Sponsorship sometimes resides within the IT organization as MDM can be considered an IT-driven effort. Organizations also often have formal or informal Business and IT partnerships whereas the IT Sponsor supports the business-led initiatives. In either case, the IT sponsor plays a critical role in the MDM project’s success and should be part of the Steering Committee.
Organization Standards Bodies In cases where organizations have cross-functional teams driving adoption of common standards across the enterprise, this role might be a good candidate for the MDM Steering Committee. Examples of such standards may include IT Architecture, IT Integration, Meta Data Management and more.
Data Domain Owner When companies are organized around the key components of its business cycle, such as Customers, Products, or Suppliers, there may be Data Domain Owners who will be part of Steering Committee decision-making.
MDM Champion In some instances, an MDM champion oversees all business and IT aspects of an MDM implementation. In such cases, this role is part of the MDM Steering Committee.
MDM Partner In order to drive optimal value from its MDM investment, companies are encouraged to include their MDM implementation and/or software partner in the Steering Committee. The MDM Partner offers best practice insight to support Steering Committee decision-making.


While it’s easy to think of master data management as a technological issue, a purely technological solution without corresponding changes to business processes and controls will likely fail to produce satisfactory results.

This article has covered the reasons for adopting master data management, the process of developing a solution, several options for the technological implementation of the solution and who should be involved along the way to make sure the program runs smoothly.

This article is an update of the original article titled “The What, Why, and How of Master Data Management” by Kirk Haselden and Roger Wolter, originally published in 2006. Special thanks to Roger and Kirk for their contributions, and allowing Profisee to repubish their article, with updates for today.

Interested in learning more? Download a full copy of the guide below.


Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.


MDM vs. MDS graphic
The Profisee website uses cookies to help ensure you have the best experience possible.  Learn more