What Is a Medallion Architecture for MDM?

A medallion architecture (or a medallion lakehouse architecture) is a way to build and think about the underlying data infrastructure that enables data products within an organization. Though the term was popularized by the data analytics and AI company Databricks, you don’t have to use Databricks to implement a medallion architecture.

In this article, we’ll explore the concept of a medallion architecture, explaining what it is, how it compares to a data mesh architecture, why you might consider using it and how it’s related to data products.

The State of Master Data Management (MDM) 2025

The State of Master Data Management 2025 is your essential guide to understanding the latest trends, challenges and vendors shaping the MDM landscape.

Medallion Architecture Explained

Medallion architecture is a way of organizing and arranging different components of data infrastructure for various use cases, including creating data products. By no means is it a new approach to getting accurate, consistent, high-quality data for creating data products — rather, it’s a helpful framework for understanding such an approach without getting too bogged down in the technical details.

A diagram illustrating the Medallion Architecture for data value and accessibility progression across IT, CDO (Chief Data Officer), and LoB (Line of Business). The vertical axis represents the value of data, while the horizontal axis represents data accessibility and governance level. It shows data evolving from "Raw Data in Source Systems" (Bronze Medallion) under IT, to "Raw Data in Analytics Layer" and "MDM 'Aligned'" (Silver Medallion) under CDO, and finally to a "Consumable Master Data Product" (Gold Medallion) under LoB. The Gold Medallion data is fully packaged, including transactional data, and is self-describing, leading to business insights and operational improvements. The diagram emphasizes the transition from data (requiring technical expertise) to knowledge (democratized data usable by anyone). It also highlights "Consumable Data + Copilot Simplicity = Information Revolution."

You can think of a medallion architecture as broken down into three main stages or layers.

1. Bronze Medallion Data

Raw data (what we’d call bronze medallion data) is integrated into a tool — like a master data management (MDM) tool — so that it can be cleansed. Raw data comes from a variety of sources, but for many organizations, it usually comes from systems like ERP, CRM, legacy applications, EHR or even spreadsheets. These are known as source systems.

At this stage, data from these sources is often inaccurate, incomplete, inconsistent, out of date or duplicated. This is a problem because it means you get conflicting data about the same real-world entities the data describes, whether that’s a product, a person or a place. Left untouched, bronze medallion data is not fit for creating data products because any resulting analytics or operational processes would be rife with duplicate, incorrect or inconsistent values.

For example, suppose you have a customer named Alana Bosh. You might have data about Alana in three different ERP or CRM systems. In Salesforce, her name might be misspelled as “Alana Bosch.” Her name might be spelled correctly in other systems, but her contact information is alanabosh@gmail.com in Microsoft Dynamics and alana.bosh@thriftybikes.com in SAP.

You happen to personally know this customer, so you know all three records describe the same Alana Bosh. However, members of your analytics or operations team probably don’t know this. Alana is also just one person — you can easily imagine how quickly even seemingly small inconsistencies like this can become a big problem when you start talking about hundreds, even thousands, of duplicated records across different systems.

2. Silver Medallion Data

Once raw (or bronze medallion data) has been integrated from the disparate sources where it’s stored, it’s ready to be cleaned, standardized and, optionally, enriched. Integrated data that has undergone these processes is called silver medallion data.

Data cleaning and standardization can look different across organizations depending on the purpose of doing so, but if you’re using an MDM tool like Profisee, the process goes like this:

  • Matching: Similar records are identified, and matches are made.
  • Deduplication and Survivorship: According to your organization’s data governance policies, the most relevant, accurate and up to date attributes are selected to “survive” and live in the updated record. Duplicate records are then eliminated while the surviving data attributes remain in the updated record. When survivorship rules have been determined and two matching records become one, the records are said to have been merged. Merging and deduplication happen simultaneously.
  • Standardization: Referring again to your data governance policies, updated records are formatted for consistency. For example, if you’re working with a customer data record, you would make sure that names are all stored the same way and that addresses and other data attributes follow the same conventions. You don’t want one customer name to be formatted as “Last Name, First Name” and another as “First Initial, Last Name.”
  • Enrichment (optional): Some organizations choose to enrich their data with third-party data services like Dun & Bradstreet or Melissa. Address verification is a common use case here, but some organizations also use third-party data enrichment for email addresses and phone numbers in addition to demographic information like age.

You may have also seen silver-medallion data referred to as a “mastered” list or database. It is the trusted, complete and centralized version of an organization’s most critical master data that is then suitable for downstream analytics or operational use cases, where it can be then stitched together with transactional data to form our next stage in the medallion architecture process.

3. Gold Medallion Data

At this stage, data is published from the tool you used to clean, standardize and enrich it to the downstream systems that need access to it, where it becomes gold medallion data. Gold medallion data is data that is ready for consumption by downstream systems, such as enterprise analytics, BI, AI or custom applications.

Key attributes of gold medallion data are:

  • Trustworthy: The data lineage is preserved in the record’s metadata, letting data stewards trace the data record back to its source and see which operations it has undergone
  • Accessibility: Gold medallion data is available to the people and systems who need it
  • Governance: Gold medallion data has been cleaned, standardized and enriched and is described in a way that’s useful and relevant to the organization, often using metadata in a data catalog
  • Security: Gold medallion data is secure and can only be accessed by the people and systems who need access to it
  • Valuable: It’s good to have high quality data, but there’s a compelling argument to be made that if the data is not or cannot be used to drive business outcomes, it isn’t very valuable
  • Understandable: In a word, gold medallion data is a data product — that is, it’s packaged so that end users can put it to work for their specific use cases

For a full explanation of gold medallion data, see our article on consumable data.

Medallion Architecture vs. Data Mesh Architecture

Spend any time around data products and you’re likely to hear about another common architecture — a data mesh. Like medallion architecture, data mesh architecture is used to enable the creation of data products but by taking a different approach.

Medallion Architecture Data Mesh Architecture
Approach Raw data is integrated into a central repository and cleaned, standardized and enriched by one team Raw data is cleansed, standardized and enriched by federated teams overseeing one data domain at a time
Pros Close watch over data governance and highly structured — good option for organizations with less complex governance requirements Flexible and scalable while still maintaining high data quality, avoids creating bottlenecks and accommodates more complex governance requirements
Cons Can lead to bottlenecks and doesn’t scale as well as other architecture models Less centralized oversight: may not be suitable or feasible for smaller organizations

In medallion architecture, one team is usually responsible for carrying out the tasks needed to create data products, centralizing data in one place to create a single source of truth for specific business use cases. However, in data mesh architecture, these responsibilities are distributed and decentralized, with multiple teams preparing data for use as data products.

Raw data still gets cleaned, standardized and enriched before it’s made available to the people and systems who need it, but it’s not integrated and consolidated into a central location as with medallion architecture. Instead, data is prepared by domain or within a line of business. For example, one team will handle customer data while another handles product or location data, and so on.

Data products are the end destination with either approach — it’s just a different way of getting there. Larger, more complex organizations sometimes prefer data mesh architecture over medallion architecture, but other factors also influence this decision, such as an organization’s culture, goals, structure and resources.

Should You Use Medallion Architecture?

Medallion architecture is a popular way to enable data products, but that doesn’t mean it’s right for every organization. Organizations that stand to benefit the most from medallion architecture tend to place a high value on having a single source of truth for all master data domains, centralizing data management and governance with one team, powering analytics-driven use cases and maintaining hierarchical data quality control.

When considering whether medallion architecture makes sense for your organization, ask yourself:

  • What specific business use cases do our data products need to enable? Are they more analytics-driven or specific to certain data domains?
  • Can we reasonably manage the volume of data with one team?
  • Is it more important that all data is consistent and high-quality or that we can be flexible to adapt to changing business priorities?

Take Your Data from Bronze to Gold with Profisee

The Profisee MDM platform makes it easy to build powerful data products using medallion architecture or any other approach. One of the first MDM tools to be natively integrated with Microsoft Fabric, Profisee takes a multidomain approach to mastering your most critical business data, letting you integrate raw data from any source to quickly and confidently match, merge, deduplicate, standardize and enrich before making it accessible in Azure OneLake, Azure Databricks, Azure Synapse, Azure OpenAI or other systems that depend on it.

Learn more about how Profisee can help you make data products a reality by exploring our native integration with Microsoft Fabric for seamless, embedded master data management, right in Fabric.

The Only Native Workload in Microsoft Fabric

As the first and only master data management (MDM) platform integrated with Microsoft Fabric, Profisee Adaptive MDM makes it easy to leverage trusted data.
Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic