Episode Overview:
In this episode of CDO Matters, Malcolm is a guest on the Data Engineering Podcast with Tobias Macey. This episode is a great fit for any non-technical data leader who is looking to gain a deeper understanding of some of the technical dependencies and concepts required for successful master data management (MDM) and data governance — but without getting too deep into jargon or software engineering concepts.
If you’re a business-centric CDO with limited technical experience or background, this podcast will help you to build your data literacy and allow you to have deeper and more compelling conversations with your technical staff — and it will help you make more informed technology-centric decisions.
Malcolm and Tobias cover some of the technical concepts involved in MDM and governance programs — in both relatable and understandable terms — including:
- MDM systems architecture and typical implementation patterns
- The connection between MDM, data engineering and systems architecture
- Data modeling for MDM and data governance
- The processes used in MDM platforms to support data quality requirements
- Entity resolution, i.e., matching or deduplication
- MDM team dynamics and roles, the role of data stewardship.
After listening to this podcast, any data leaders who may be new to the concepts of MDM or data governance will why their organizations need these foundational elements and better understand how they can be used to drive business benefit.
Key Moments
- 3:14 – Identifying ‘who is a customer’ to model and govern data
- 7:11 – What is MDM and how does it add value?
- 10:27 – Who needs MDM and how does new technology solve for data quality?
- 15:11 – Limitations and considerations when searching for a “single source of truth”
- 18:15 – Who is responsible for MDM within an organization and who comprises it?
- 22:16 – What are the differences between analytics and operational MDM?
- 29:15 – Top 4 reasons that so many MDM implementations fail?
- 32:45 – Using a business perspective to identify the right outcomes
- 37:40 – How MDM is evolving to use graph functionality in addition to relational databases
- 42:32 – Why Customer Data Platforms (CDPs) fall short for enterprise-level management
- 43:36 – Insights on novel MDM use cases: data sharing, graph databases, data fabrics
- 50:08 – 3 ‘Watch-outs’ learned from years in the data management space
- 54:36 – How small companies can implement MDM principles
- 57:38 – The gap between data software and real business outcomes
Key Takeaways:
When is MDM relevant for an organization? (10:22-11:34)
“The bigger and more complex you are and the more decentralized you are…where organizations are struggling to have a single view of the customer…the larger the company, the more they tend to have a need for MDM.” – Malcolm Hawker
Cloud-native data warehouses vs. MDM software (15:11-16:33)
“There are many cloud-based data warehouse technologies that are saying we can enable a single version of the truth, and they absolutely can…but does it have all the flexibility and reconfigurability to allow for all the things that MDM software can do? Typically, they don’t.” – Malcolm Hawker
What are the differences between analytics and operational MDM?? (23:51-26:22)
“An analytical style of MDM is where the flow [of data] is one-way…[operational MDM] can actually turn around and syndicate that data back down into consuming systems.” – Malcolm Hawker
4 MDM pitfalls to avoid during your implementation (29:15-31:01)
“If you’ve got a need for MDM and if you have been given a mandate by your management to come up with a single version of the truth…avoid the key pitfalls that often send so many MDM programs sideways.” – Malcolm Hawker
Companies of all sizes can benefit from MDM principles (57:05-57:26)
“I would argue that most companies need MDM as a discipline…But chances are, you still have some use cases that need that consistent approach to the data management side…” – Malcolm Hawker
About the Guest
Tobias Macey is a dedicated engineer with experience spanning many years and even more domains. He currently manages and leads the Technical Operations team at MIT Open Learning where he designs and builds cloud infrastructure to power online access to education for the global MIT community. He also owns and operates Boundless Notions, LLC where he offers design, review, and implementation advice on data infrastructure and cloud automation.
In addition to the Data Engineering Podcast, he hosts Podcast.init where he explores the universe of ways that the Python language is being used. By applying his experience in building and scaling data infrastructure and processing workflows, he helps the audience explore and understand the challenges inherent to data management.