Data Matching Software: Use Cases and Techniques

Master data management (MDM) begins with harnessing your data through a process called data matching. It’s perhaps one of the most crucial steps in any data transformation initiative, as it requires the organization to make decisions about what data gets saved, how that data is formatted and how the company will prevent duplicate records in the future.

In this article, we’ll explore data matching, discussing the different types of data matching, tools you can use for data matching, the benefits of data matching and how to perform data matching. Let’s get started!

Harvard Business Review: Data Readiness for the AI Revolution

New Harvard Business Review Pulse Survey finds most companies want to adopt AI across the enterprise, but their data is largely not ready.

What Is Data Matching?

Data matching is the process by which related records from across an organization are identified, standardized and merged. Because data can be formatted differently across source systems, data matching is critical to ensure that records are correct, up-to-date and standardized across all the different systems that work with that data.

According to the book Data Matching by Peter Christen, data matching is also known as:

  • Data linkage
  • Record linkage
  • Entity resolution
  • Object identification
  • Field matching

These terms are all similar in that they describe the process by which data from more than one source is identified, cleansed and de-duplicated for consistent use.

What Is Data Matching Software?

Data matching software automates the process of data matching, greatly reducing the labor involved in matching and cleaning data for use. Key features of data matching software include:

  • API connections or webhooks to integrate data from data sources
  • A matching engine, usually either powered by machine learning (ML) or graph-based matching technology, as in the case of Profisee
  • The ability to define survivorship rules to merge records for the greatest level of accuracy
  • Workflow automation settings
  • Data quality and enrichment capabilities
  • A data stewardship UI for business users to monitor data quality and review records flagged for manual review

Data matching software can be offered as a standalone solution but is usually built into master data management (MDM) software solutions instead. MDM tools help companies manage the essential and relatively unchanging data that runs their business — master data.

What Is Customer Data Matching?

Customer data matching is the process whereby organizations pull customer data from each of their software tools that store information about customers to build golden records of customer data. Customer data matching ensures that the organization works from the most up-to-date and accurate master customer data across its data estate.

Example customer data that may get updated during the matching process includes:

  • Contact name
  • Customer address
  • Business name
  • Contact phone number
  • Contact email address

Because these fields are manually entered by employees or by customers themselves, companies might pull different entries for any of these data points for a single customer.

Data matching pulls this information from the company’s CRM, marketing automation tool and ERP, for example, into a central location. There, the matching software can correct typos, fill empty fields, standardize address formatting and produce a single, golden record for use across the company.

What Types of Data Matching Are There?

While all data matching processes have the same intended outcome, how they get there varies by the type(s) of data matching they provide. Data matching processes can be split into two major types

Deterministic Matching

Also known as exact matching or deterministic linking, this is when the matching tool (or human) doing the matching will combine data sets based on fields that contain identical sets of characters.

Deterministic matching works when the data includes unique IDs such as social security numbers or customer ID numbers. If all the software platforms the company uses include a standard, unique customer identification number except for your shipping software, customer records can be matched, cleansed and de-duplicated across all source systems except your shipping software.

Probabilistic matching

Also known as fuzzy matching or probabilistic linking, this matching process combines data fields that do not make an exact match but rather perform above a predetermined probabilistic matching threshold. Fuzzy data matching requires data sets with several different data points to best calculate the probability of a match.

These tools may be set to match key identifying attributes over several data points, a percentage threshold of matching characters in a field or other custom parameters. The probability of a match is then determined based on the tool’s confidence and expressed as a number between 0 and 100.

Do You Need a Data Matching Tool?

Organizations can quickly acquire vast quantities of data within each source system, and those numbers increase with every lead, sale, appointment or shipment that is made. Importing, sorting and manually matching every record from each system and compiling that data into a single golden record would require thousands of hours, and the process would be prone to the same mistakes and typos as the original data.

Data matching tools automate the process of consolidating and cleansing data according to the company’s data needs. These tools can scan matches across millions of entries from every software database at the company, match the records that meet requirements and flag exceptions for review. This allows data analysts to manage data by exception, rather than by individual record.

Data Matching Examples 

Depending on the industry you serve and the products you sell, your master data aspirations may differ. Location and customer data are two of the most common golden records that companies will keep.

Location Data Matching

Location data matching is used to standardize and complete records that describe locations. This can include locations of field offices, distributors, customers or sales prospects. While location data is slow to change, it can require quite a bit of standardization, especially if the data is pulled from spreadsheets or other manual entry entities.

In the example below, the Home and Primary locations are the same, although there are several differences in the ways the address and phone numbers are formatted. Data professionals will decide the best format for each of these fields when creating data governance policies, which will then be enforced during matching and standardization.

Name Address Phone Fax Zip Code
Home 123 123rd street (555) 987-5555 (555) 987-5556 00011
Primary 123 123rd St 555.987.5555 555.987.5556 00011
Second 56 Lincoln (555) 678.5555 (555) 678.5556 00012
Riverside 476 Riverside 555-444-5555 555-444-5556 01113

Another decision that the data professionals may want to consider is the naming conventions used for each of their locations. While “Home” and “Second” may work for a company with only two locations, naming the location by the street name may work better for an expanding enterprise. The name could be extracted from the address field and filled for all locations.

Customer Data Matching

Customer data matching is useful for getting a unified view of your customers, but it can be complicated by the use of PO boxes, differences between billing and shipping addresses, multiple email addresses and differences between how customer names are entered. Finding a single field that stays consistent across customer records from different parts of an enterprise is unlikely.

Once the customer data is matched and standardized, however, a unique customer ID can be assigned to each customer, and potential matches can be surfaced by the customer MDM software to prevent mismatches based on misspellings or abbreviations.

NameBillingShippingEmail
Tony B.56 Lincoln St.
Billings MT 00011
56 Lincoln St.
Billings MT 00011
Tony.boloney@gmail.com
Tony BolognaPO Box 8776
Nashville TN 37220
56 Lincoln St.
Billings MT 00011
Tony.boloney@gmail.com
Anthony B.PO Box 8776
Nashville TN 37220
PO Box 8776
Nashville TN 37220
Tony.boloney@gmail.com
Cassandra Bologna123 123rd St.
Billings MT 00011
123 123rd St.
Billings MT 00011
Sandra.t.bologna@gmail.com

Data Matching Benefits

As long as you can use data matching software to assist you in the process, data matching can pay dividends in reducing costs and improving the quality of data analysis across the enterprise. Here are just some of the benefits of data matching.

Golden Record Data

Data matching is a crucial step in producing golden records — master data that companies rely on to run their businesses efficiently. A golden record of customer, product, location and employee data reduces company mistakes, increases the accuracy of forecasts and provides a baseline for innovative data uses like generative AI.

Reduced Database Size

De-duplication and elimination of incorrect, incomplete and inaccurate master data drastically reduces the volume of data to be maintained. A reduced database will speed compute processes on the database and save you money in the long term.

Reduced TCO for Data Storage

Cloud data storage has been a true game-changer for many companies that previously did not have the funds or physical space to manage their own data center. While cloud data storage providers offer flexibility and scalability for database storage, the costs can get out of hand quickly. But data matching and cleansing will reduce your overall database size, meaning the amount you pay for storing that data will also go down.

Faster and More Accurate Data Analysis

Accurate data analysis relies on accurate databases, and fast data analysis relies upon low ingestion times. When golden record data is accurate and de-duplicated, the analysis teams and tools don’t have to filter out exceptions due to inaccurate data or second-guess the analysis provided by business intelligence and forecasting tools.

Regulatory Compliance

Companies that employ data matching solutions have an easier time complying with federal, regional and local regulations because they have fewer records to parse for each data request. And because the data footprint shrinks with proper data cleansing and policies to prevent re-duplication, protecting and accessing individual records takes less time.

Decreased Security and Fraud Risk

Smaller databases are easier to maintain and secure, while duplicate or incomplete customer records can be a point of entry for bad actors. When your company maintains clean and accurate records, those records are easier to monitor for inconsistencies or outright breaches. Data matching solutions can also give time back to analysts to follow up on exceptions or unexpected outcomes from the data.

Data Matching Challenges

Every digital initiative brings challenges, but the right tools and processes can reduce the effects of challenges like incomplete records, exceptions and ongoing data cleansing.

Incomplete Records

Data matching requires correct or complete information in at least some of the fields to meet fuzzy matching thresholds. However, if incomplete or incorrect data exists, the organization may not have enough information to update the record.

Data matching tools that also provide data enrichment services through publicly accessible databases can lower the number of incomplete fields across the dataset.

Exceptions

Exceptions occur in data matching when a field or group of fields does not fit into one of the predefined categories. Exceptions are common with exact data matching, as any misspelling, typo or incomplete field could be considered an exception.

In probabilistic data matching, any field that does not meet the criteria or correctness threshold would be marked as an exception, which must then be reviewed by a data analyst. Depending on whether the analyst can surmise what information is missing or incorrect will determine whether the field is corrected, or the entry discarded.

Re-Duplication of Records in the Future

Companies must devise a process by which data from across the enterprise will be matched in the future to prevent the reintroduction of duplicate records — a process often referred to as “lookup before create.” The original data cleansing and matching process takes time and effort to complete, but data entry protocols and software with access to centralized and cleansed master data as reference materials prevent the need for additional periodic data matching rounds.

Data Matching Use Cases

The power of data matching lies in the company’s ability to put previously chaotic data to use to improve productivity and outcomes for customers. These three companies used Profisee MDM to create golden records.

Healthcare

Mass General Brigham, a Massachusetts-based healthcare organization, wanted to build a provider search system that would help customers find in-network healthcare providers based on several factors including specialty, location and insurance coverage. Mass General Brigham needed to match provider data from across several source systems, including spreadsheets, and cleanse the data of misspellings and abbreviations.

Mass General Brigham uses Profisee to match data across its systems and create golden records that are used to feed their provider search system.

Retailers

Rheem Manufacturing needed to combine the company’s air (HVAC and air conditioning) and water (water heaters) divisions. The move would consolidate customer, sales and supplier data across different global lines of business and expand upon the traditional specializations of the contract installers, allowing for cross- and up-sell opportunities that were not previously available.

Rheem uses Profisee’s intelligent fuzzy-matching features to consolidate and combine customer data from distributors and installers.

Franchises and Multi-Location Corporations

The YMCA of the USA is a non-profit organization with 763 associations, 2,700 branch locations, 20 million members and over 30,000 staff members. With so many locations and unique records, the YMCA needed to standardize their staff records to improve staff management and analysis that would improve operational efficiency.

The YMCA chose Profisee’s cloud MDM tool to master its staff member data, matching data across 29 different source systems. With the lessons they learned in building these golden records, YMCA of the USA will tackle branch and member data next.

Data Matching Techniques and Steps

Data matching can be done manually by combining and updating spreadsheets. Modern spreadsheet software eases the burden a bit through formatting rules and formulas, but the best option for most organizations will be to use an MDM tool with data stewardship functionality to guide business users through these steps and perform the actions, alerting the data team when there are records requiring manual review.

1. Data Integration

During this phase, data is integrated from source systems via API, webhook or file upload. This step is essential for breaking down data silos, as it entails collecting data from disconnected data sources and consolidating it into a single repository — the MDM tool.

2. Matching

This step can be split into two parts, formation of matching thresholds by which the data is either matched or flagged for manual review by data analysts. The matching threshold should strike a delicate balance between reducing the overall number of duplicate entries and potentially matching entries that should remain separate. Err on the side of duplicates or sending entries for review, if possible.

3. De-Duplication, Standardization and Enrichment

Following the matching phase, you can begin the process of merging and de-duplicating records according to your organization’s survivorship rules. In Profisee, for instance, data stewards review potentially matching records and select “winners.” The records are then merged according to pre-defined survivorship rules that determine which values Profisee should retain when it encounters conflicting or duplicate information. De-duplicated records are then standardized according to data governance policies and optionally enriched with third-party data and address verification services, such as Melissa, Google Maps or Dun & Bradstreet.

4. Golden Record Publication

Finally, with newly created golden records, data can be published from the MDM tool back to the source systems so that every system references the most accurate, complete and up-to-date version of each record. Depending on your organization’s data architecture, you may alternatively publish golden records to a data lake or warehouse, where it can then be accessed by downstream systems like business intelligence, HRIS or ERP.

Make Data Matching Easy With Profisee MDM

Profisee’s MDM platform comes with out-of-the-box features for data matching, survivorship, de-duplication and standardization. Featuring one of the most advanced and flexible matching engines on the market, Profisee lets you automate data matching while still providing an intuitive, user-friendly interface for business users to manually review potentially matching records and monitor data quality.

If you’re interested in learning more about how Profisee’s in-memory, graph-based matching engine can help your organization simplify data matching and improve business outcomes, check out our datasheet below for a more detailed look at how Profisee handles data matching, survivorship and many other essential MDM functions.

Datasheet: Golden Record Management

Download our free datasheet to learn more about how Profisee's matching capabilities can help you create golden records for your core business data.
Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic
The Profisee website uses cookies to help ensure you have the best experience possible.  Learn more