Header image for Data Quality - What, Why, How, 10 Best Practices & More [2023]

What is Data Quality | Management, Tools, Dimensions [Updated 2023]

Updated: October 30, 2023

According to a recent Gartner article, 84 percent of customer service and service support leaders cited customer data and analytics as “very or extremely important” for achieving their organizational goals in 2023. As data is becoming a core part of every business operation, the quality of the data that is gathered, stored and consumed during business processes will determine the success achieved in doing business now and in the foreseeable future.

Download your copy of the comprehensive guide to keep in your back pocket. Or if you are ready to learn more, continue reading below.

October

In this article we will:

Start reading!


What is data quality?

Treat data as the foundation for a hierarchy where data is at the bottom level. On top of data, you have information, being data in context. Further up, we have knowledge seen as actionable information and on the top level, wisdom as the applied knowledge.

If you have poor data quality, your information’s credibility suffers. With bad information quality, you lack actionable knowledge in business operations and are unable to apply that knowledge or do it incorrectly resulting in risky business outcomes.

There are several definitions of data quality. The two predominant ones are:

  • Data is of high quality if it is fit for its intended use or purpose.
  • Data is of high quality if it correctly represents the real-world construct it describes.

These two possible definitions may contradict each other. If, for example, a customer master data record is fit for issuing an invoice for receiving payment, it may be fit for that purpose. But if that same customer master data record is also incomplete or incorrect for customer service, because the data incorrectly describes the who, what and where of the real-world entity of having the customer role in that business operation, we have a business problem.

Master data must be suitable for multiple purposes. You can achieve that by ensuring real-world alignment. On the other hand, it might not be profitable and proportionate to strive for the perfect real-world alignment in order to have data fit for the intended purpose within the business objective where a data quality initiative is funded. In practice, it is about striking a balance between these two definitions.

One of the biggest contributing factors for data inaccuracy is simply human error. Avoiding or eventually correcting low quality data caused by human errors requires a comprehensive effort with the right mix of remedies concerning people, processes and technology.

Other top reasons for data inaccuracies are a lack of communication between departments and inadequate data strategy. Solving such issues calls for passionate top-level management involvement.

The Importance of data quality

Usually, it is not hard to get everyone in a business, including the top-level management, to agree that having good data quality is good for business. In the current era of digital transformation, the support for focusing on data quality has improved.

However, when it comes to the essential questions about who is responsible for data quality, who must do something about it and who will fund the necessary activities, there remains a challenge.

Data quality resembles human health. Accurately testing how any one element of our diet and exercising may affect our health is fiendishly difficult. In the same way, accurately testing how any one element of our data may affect our business is difficult as well.

Nevertheless, numerous experiences tell us that bad data quality is not very healthy for business.

classic examples are:

  • In marketing you overspend, and annoy your prospects, by sending the same material more than once to the same person – with the name and address spelled a bit different. The problem here is duplicates within the same database and across several internal and external sources.

  • In online sales, you cannot present sufficient product data to support a self-service buying decision. The issues here are the completeness of product data within your databases and how product data is syndicated between trading partners.
  • In the supply chain, you cannot automate processes based on reliable location information. The challenges here are using the same standards and having the necessary precision within the location data.
  • In financial reporting, you receive different answers for the same question. This is due to inconsistent data, varying data freshness and unclear data definitions.

On a corporate level, data quality issues have a drastic impact on meeting core business objectives, as:

  • Inability to promptly react to new market opportunities thus hindering profit and growth achievements. Often this is due to not being ready for repurposing existing data that were only fit for yesterday’s requirements.
  • Obstacles in implementing cost reduction programs, as the data that must support the ongoing business processes needs too much manual inspection and correction. Automation will only work on complete and consistent data.
  • Shortcomings in meeting increasing compliance requirements. These requirements span from privacy and data protection regulations as GDPR, health and safety requirements in various industries to financial restrictions, requirements and guidelines. Better data quality is most times a must in order to meet those compliance objectives.
  • Difficulties in exploiting predictive analysis on corporate data assets resulting in more risk than necessary when making both short-term and long-term decisions. These challenges stem from issues around the duplication of data, data incompleteness, data inconsistency and data inaccuracy.

How to Improve Data Quality

Improving data quality takes a balanced of encompassing people, processes and technology as well as a good portion of top-level management involvement.

Data Quality Dimensions

When improving data quality, the aim is to measure and improve a range of data quality dimensions.

Uniqueness

Uniqueness is the most addressed data quality dimension when it comes to customer master data. Customer master data is often marred by duplicates, meaning two or more database rows describing the same real-world entity. There are several remedies around to cure that pain going from intercepting the duplicates at the onboarding point to bulk deduplication of records already stored in one or several databases.

With product master data, individuality is a less frequent issue. However, completeness is often a big pain. One reason is that completeness means different requirements for different product categories.

Consistency

When working with locational master data, consistency is a challenge. Addressing, so to speak, the different postal address formats around the world is certainly not a walk in the park.

Precision

In the intersection between the location domain and the customer domain, the data quality dimension (precision) can be hard to manage, as different use cases require different precision for a location whether being a postal address and/or a geographic position.

What is relevant to know about your customers and what is relevant to tell about your products are essential questions in the intersection of the customer and product master data domains.

Conformity

Conformity of product data is related to locations. Take unit measurement. In the United States, the length of a small object is in inches. In most other parts of the world, it will be in centimeters. In the UK, you will never know.

Timeliness

Timeliness, meaning if the data is available at the necessary time, is the everlasting data quality dimension all over.

Accuracy, Validity & Integrity

Other data quality dimensions to measure and improve are data accuracy – being about the real-world alignment or alignment with a verifiable source – data validity – being about if data is within the specified business requirements – and data integrity – being about the if the relations between entities and attributes are technically consistent.

Data Quality Management

In data quality management, the goal is to exploit a balanced set of remedies in order to prevent future data quality issues and to cleanse (or ultimately purge) data that does not meet the data quality Key Performance Indicators (KPIs) needed to achieve the established business objectives.

The data quality KPIs will typically be measured on the core business data assets within the data quality dimensions as data uniqueness, completeness, consistency, conformity, precision, relevance, timeliness, accuracy, validity and integrity.

The data quality KPIs must relate to the KPIs used to measure the business performance in general.

The remedies used to prevent data quality issues and eventual data cleansing includes these disciplines:

  • Data Governance
  • Data Profiling
  • Data Matching
  • Data Quality Reporting
  • Master Data Management (MDM)
  • Customer Data Integration (CDI)
  • Product Information Management (PIM)
  • Digital Asset Management (DAM)

Data Governance

A data governance framework must lay out the data policies and data standards that set the bar for what data quality KPIs are needed and which data elements should be addressed. This includes what business rules must be adhered to and underpinned by data quality measures.

Furthermore, the data governance framework must encompass the organizational structures needed to achieve the required level of data quality. This includes a data governance committee or similar, roles as data owners, data stewards, data custodians or similar in balance with what makes sense in a given organization.

A business glossary is another valuable outcome of data governance used in data quality management. The business glossary is a primer to establish the metadata used to achieve common data definitions within an organization and eventually in the business ecosystem where the organization operates.

Data Profiling

It is essential that the people who are responsible for data quality and those who are tasked with preventing data quality issues and data cleansing have a deep understanding of the information at hand.

Data profiling is a method, often supported by dedicated technology, used to understand the data assets involved in data quality management. These data assets are often populated by different people operating under varying business rules and gathered for bespoke business objectives.

In data profiling, the frequency and distribution of data values are counted on relevant structural levels. Data profiling can also be used to discover the keys that relate to data entities across different databases and to the degree that this is not already done within the single databases.

Data profiling can directly measure data integrity and be used as input to set up the measurement of other data quality dimensions.

Data Matching

When it comes to real-world alignment, using exact keys in databases is not enough.

The classic example is how we spell the name of a person differently due to misunderstandings, typos, use of nicknames and more. With company names, the issues just pile up with funny mnemonics and the inclusion of legal forms. When we place these persons and organizations at locations using a postal address, the ways of writing that has numerous outcomes too.

Data matching is a technology based on match codes, for example, Soundex , fuzzy logic and increasingly also machine learning used to determine if two or more data records are describing the same real-world entity (typically a person, a household or an organization).

This method can be used in deduplicating a single database and finding matching entities across several data sources.

Oftentimes, data matching is based on data parsing where names, addresses and other data elements are split into discrete data elements. For example, an envelope type address is split into building name, unit, house number, street, postal code, city, state/province and country. This may be supplemented by data standardization using the same value for street – Str and St.

Data Quality Reporting

The findings from data profiling can be used as input to measure data quality KPIs based on the data quality dimensions relevant to a given organization. The findings from data matching are especially useful for measuring data uniqueness.

In addition to that, it is helpful to operate a data quality issue log, where known data quality issues are documented, and the preventive and data cleansing activities are followed up.

Organizations focusing on data quality find it useful to operate a data quality dashboard highlighting the data quality KPIs and the trend in their measurements as well as the trend in issues going through the data quality issue log.

Master Data Management (MDM)

The most difficult data quality issues are related to master data as party master data (customer roles, supplier roles, employee roles and more), product master data and location master data.

Preventing data quality issues in a sustainable way and not being forced to launch data cleansing activities over and again will, for most organizations, mean that a master data management (MDM) framework must be in place.

MDM and Data Quality Management (DQM) are tightly coupled disciplines. MDM and DQM will be a part of the same data governance framework and share the same roles as data owners, data stewards and data custodians. Data profiling activities will most often occur with master data assets. When doing data matching, the results must be kept in master data assets controlling the merged and purged records and the survivorship of data attributes relating to those records.

Customer Data Integration (CDI)

Customer master data is sourced in many organizations from a range of applications. These are self-service registration sites, Customer Relationship Management (CRM) applications, ERP applications, customer service applications and many more.

Besides setting up the technical platform for compiling the customer master data from these sources into one source of truth, there is a huge effort in ensuring the data quality of that source of truth. This involves data matching and a sustainable way of ensuring the right data completeness, the best data consistency and adequate data accuracy.

Product Information Management (PIM)

As a product manufacturer, you need to align your internal data quality KPIs with those of your distributors and merchants in order to allow your products to be chosen by end customers wherever they have a touchpoint in the supply chain. This must be done by ensuring the data completeness and other data quality dimensions within the product data syndication processes.

As a merchant, you will collect product information from many suppliers with each having their data quality KPIs – or not having that yet. Merchants must work closely with their suppliers and strive to maintain a uniform way of receiving product data of the best quality according to the data quality KPIs on the merchant side.

Digital Asset Management (DAM)

Digital assets are images, text documents, videos and other files often used in conjunction with product data. In the data quality lens, the challenges for this kind of data are around correct relevant tagging (metadata) as well as the quality of the assets. For example, if a product image only shows the product clearly and not much else.

Data Quality Best Practices

The following we will be based on the reasoning provided above in this post, list a collection of 10 highly important data quality best practices.

These are:

  1. Ensuring top-level management involvement.

    Quite a lot of data quality issues are only solved by having a cross-departmental view.

  2. Managing data quality activities as a part of a data governance framework.

    This framework should set the data policies, data standards, the roles needed and provide a business glossary.

  3. Occupying roles as data owners and data stewards from the business side of the organization and occupy data custodian roles from business or IT where it makes the most sense.

  4. Using a business glossary as the foundation for metadata management.

    Metadata is data about data and metadata management must be used to have common data definitions linking those to current and future business applications.

  5. Operating a data quality issue log with an entry for each issue with information about the assigned data owner and the involved data steward(s), the impact of the issue, the resolution and the timing of the necessary proceedings.

  6. For each data quality issue raised, start with a root cause analysis.

    The data quality problems will only go away if the solution addresses the root cause.

  7. When finding solutions, striving to implement processes and technology that prevent the issues from occurring as close to the data onboarding point as possible rather than relying on downstream data cleansing.

  8. Defining data quality KPIs that are linked to the general KPIs for business performance.

    Data quality KPIs, sometimes also called Data Quality Indicators (DQIs), can be related to data quality dimensions for example data uniqueness, data completeness and data consistency.

  9. Using anecdotes about data quality train wrecks to get awareness around the importance of data quality.

    However, also using fact-based impact and risk analysis to justify the solutions and the needed funding.

  10. Nowadays, a lot of data is already digitalized. Therefore, avoid inputting data where possible.

    Instead, try to find cost-effective solutions for data onboarding utilizing third-party data sources for publicly available data. For example, with locations in general and names, addresses and IDs for companies and some cases individual persons. For product data, utilize second-party data from trading partners where possible.

Data Quality Tools

Data quality tools play a crucial role in today’s data-driven world, helping organizations ensure that their data is accurate, reliable, and consistent. These tools help businesses address common data quality issues, such as duplicate records, missing information, and inconsistent formatting, which can significantly impact decision-making and operational efficiency.

One of the primary reasons data quality tools are vital is that they enable organizations to trust their data. Accurate and reliable data is fundamental for making informed decisions, reducing errors, and avoiding costly mistakes. Data quality tools help organizations identify and rectify data inaccuracies, ensuring that their data remains a valuable asset rather than a liability. Data quality tools also aid in complying with data regulations, such as GDPR and CCPA, which require organizations to manage and protect personal information responsibly.

The importance of data quality tools, such as those offered by Profisee, cannot be overstated in today’s data-centric business landscape. These tools ensure that organizations can rely on their data for critical decisions, maintain operational efficiency, and stay compliant with data regulations. With the ability to automate data cleaning and ensure data consistency, data quality tools provide a strong foundation for organizations to harness the true power of their data, unlocking new opportunities and maintaining a competitive edge in the market.

Types of Data Quality Tools

Data quality tools encompass a variety of functionalities and capabilities to ensure that data is accurate, consistent, reliable, and compliant with relevant standards and regulations. Broadly, these tools can be categorized into several types or categories, including:

Data Profiling Tools

These tools analyze and assess the quality of data by examining its structure, content, and relationships. They help identify anomalies, outliers, and patterns within the data, giving organizations insights into the overall health of their data.

Data Cleansing Tools

Data cleansing tools focus on correcting or removing errors and inconsistencies within the data. They can automatically standardize formats, eliminate duplicate records, and address missing or inaccurate data values, improving data accuracy and completeness.

Data Matching and Deduplication Tools

These tools are specifically designed to identify and eliminate duplicate or redundant records within datasets. They use various algorithms and techniques to detect similar or identical data entries, helping maintain data integrity.

Data Validation Tools

Data validation tools check data against predefined rules and criteria to ensure it meets specific standards and validation requirements. These tools help organizations maintain data quality by flagging data that does not conform to set guidelines.

Data Enrichment Tools

Data enrichment tools augment existing data with additional information from external sources, such as public databases or third-party providers. This process can enhance the completeness and accuracy of data, making it more valuable for analysis and decision-making.

Data Governance and Data Quality Management Tools

These tools provide a framework and set of processes to establish data governance policies, data quality rules, and data stewardship practices within an organization. They help define roles and responsibilities for managing data quality and compliance.

Data Monitoring and Profiling Tools

Data monitoring tools continuously track and analyze data as it enters and exits the organization. They can trigger alerts and notifications when data quality issues are detected, allowing for real-time data quality management.

Master Data Management (MDM) Tools

MDM tools focus on creating and managing a single, authoritative source of master data across an organization. They help maintain consistency and accuracy in critical data elements like customer information, product data, and employee records.

Data Quality Reporting and Dashboards

These tools offer visualization and reporting capabilities to help organizations monitor and communicate the state of data quality. They provide insights into data quality metrics, trends, and areas that require attention.

Data Quality Assessment and Profiling Tools

These tools are designed to assess data quality on a periodic or ad-hoc basis, helping organizations evaluate their data quality, identify issues, and prioritize data improvement efforts.

Each of these types of data quality tools plays a crucial role in ensuring that data is fit for its intended purpose, maintaining data integrity, and supporting informed decision-making within an organization. The choice of which tools to use depends on the specific data quality challenges and objectives of the organization.

Data Quality Resources

There are many data quality resources available where you can learn more.

Below is a list of some of the useful resources for framing a data quality strategy and addressing specific data quality issues:

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

You're Registered!

Complete the form below to request your spot at Profisee’s dinner at Rosa Mexicano in the Dolphin Hotel on Tuesday, March 11 at 7:30 PM ET

CDO Matters Podcast with Malcolm Hawker

NEW EPISODE

Top Data Trends of 2023

Wendy Turner-Williams Headshot

with Wendy Turner-Williams,
Chief Data & AI Officer

Join Profisee at our upcoming event:

CDO MATTERS LIVE
with former Gartner analyst Malcolm Hawker

REGISTER BELOW

MDM vs. MDS graphic
The Profisee website uses cookies to help ensure you have the best experience possible.  Learn more