Business Strategy
Culture
Data Management
Emerging Tech

The CDO Matters Podcast Episode 08

Be More Social & Less Technical with Dr. Juan Sequeda

X

Episode Overview:

When it comes to leading a successful business, it is crucial to be data-driven. But being overly technical in your approach can often take away from the social needs of your enterprise.

In this episode, Malcolm and Dr. Juan Sequeda focus primarily on four key topics: data as a product, the data mesh phenomenon, why data leaders are incorrectly focused on technology and how taking a more ‘social’ approach — as advocated by the data mesh — will deliver superior results.

Dr. Sequeda breaks down data-related technologies into three core principles that he argues have changed little over the last several decades. CDOs with more of a business or non-technical background will appreciate how Dr. Sequeda is able to distill the complexities of the modern data estate into a simplified model — and warns how various data management vendors continue to complicate by focusing too much on software tools and features.  

While exploring ways for data leaders to extricate themselves from technology-first approaches, the two explore the growing trend towards data as a product and how CDOs can benefit from it. Dr. Sequeda shares his ‘ABC’ framework for approaching data as a product that CDOs from all backgrounds can quickly use within their data organizations.

Dr. Sequeda both challenges and acknowledges the benefits of data centralization during a discussion focused on how master data management (MDM) is still needed by all organizations despite the decentralized approach advocated by the data mesh. Ultimately, it should be no surprise that a noted scholar on knowledge graphs believes that context and semantics should drive more modern approaches to governance and MDM — where the context or use case of data ultimately determines what rules/policies should be defined rather than the data itself.  

This episode of CDO Matters will help less technical CDOs understand the underlying data ‘semantics and why the data mesh — most especially the ‘data as a product’ phenomenon — is worthy of consideration. Prioritizing efforts to integrate product management disciplines in data management — at both centralized and decentralized levels — will ultimately help data leaders to drive superior results by being more driven socially.  

Key Moments

  • [4:24] Bridging Tech and Business
  • [6:06] Defining Data Mesh for Your Organization
  • [8:20] A Social-first Approach to the Data Mesh
  • [10:52] What Comes After Data Decentralization?
  • [15:10] The 3 Principles of the Data Stack
  • [16:01] Modern Data Developments and How Data Software Categories Drive the Conversation
  • [17:05] Social vs. Cultural Business Approaches
  • [20:15] Metadata Serving as the Glue Behind Data
  • [23:12] Operational Focus of the Data Mesh
  • [25:20] The Relevance of Master Data Management (MDM) Today
  • [28:30] Powering a Data Fabric with a Semantic Layer
  • [33:20] Data Centralization through Governance

Key Takeaways

Bridging Technology and Business for CDOs [5:05 — 6:03]

“I would say you need to have people on your team who can be those bridges…who will be able to fill that gap [between technology and business]. As a leader, you want to understand the overview of things, but you also want to feel empowered by having the best people around you.” — Dr. Juan Sequeda

Is Data Mesh a Software Category? [7:16 — 8:14]

“Data mesh is a social-technical paradigm shift, it is not something you buy… if somebody is selling you a data mesh, please run far away as fast as you can from that vendor because they are selling you B.S.” — Dr. Juan Sequeda

The 3 Principles of the Data Stack [15:06— 16:49]

“We talk about the modern data stack…look at the principles…here is this box and it has inputs and outputs. It is the three main boxes. One is the box that moves data. Data comes in, data comes out. Then you have another box where data comes in, questions come in and answers come out. That is your storage and compute…then you have another box where different questions come out. That is your analytics.” — Dr. Juan Sequeda

The Problem with Being Overly Tech-Focused [17:05 — 17:42]

“The issue here is that we have been defining success from a technical perspective, which is ‘my data is now in one place,’ but that was not the goal…define success from the social perspective about the needs of the business.” — Dr. Juan Sequeda

About the Guest

Dr. Juan Sequeda is the Principal Scientist at data.world and the co-host of the Catalogs & Cocktails podcast. Juan holds a PhD in Computer Science from the University of Texas at Austin and is a noted scholar and researcher in the fields of semantic technologies, including knowledge graphs. He is a frequent public speaker at data and analytics conferences across the globe and is passionate about helping data leaders implement more modern and innovative approaches to both data strategy and data management.  

Episode Links & Resources:

Good morning, evening, or afternoon, everybody. This is Malcolm Hawker, and I’m here with another episode of the CDO Matters podcast. I’m joined today by mister Juan Sequoita, who is the principal scientist with data dot world, the host of the catalog and cocktails podcast, which you was you should certainly check out if you haven’t already.

We’re gonna today talk today about a few different concepts, a few different kinda hot topics in the data management space. The first one I’m really looking forward to, we’re we’re gonna dive into, we’re gonna dive into the data mesh.

I met Juan for the first time in at least in person, about a month and a half ago, maybe two months ago at, at the CDOIQ conference in in am I on the MIT, campus, which I thought was just a fantastic conference. One of one of the best that I’ve that I’ve been to.

Had had a great conversation with him there. He actually kind of challenged me on some of my MDM centric views of the world, which is which is a really good thing. I always find that, make makes for great conversations.

One is a resident of Austin, Texas, which I lived in Austin for almost twenty years before moving here to the, to the Florida coast. So we we we certainly have some some things in common there. I’ll I’ll stop introducing you, Juan. Maybe you wanna say just a few words about yourself. What would you want our listeners to to to know about who you are and and what makes you tick?

Yeah. Well, first of all, Malcolm, thank you so much for inviting me. And I’m I I we we we met, we had a quick conversation at, having cocktails at, MIT CDO conference. Yeah.

And it was just great how we really connected in, and we had some some good friction, and I really love that. And so so thanks again. I’m really really excited to get more into that. So just quickly, I’m I’m the principal scientist at data dot world.

And and together with my colleague, Tim Gasper, we both host, catalog and cocktail, which has been this podcast that we’ve been doing now for over two years.

I my background, I come from the I have an academic background. I I did my undergrad and my PhD at the University Texas at Austin working on data integration and semantics and knowledge graphs before there were knowledge graphs and stuff. And I continue to be an academic in academia. I have PhD students and stuff, but I’ve kind of I’ve loved to go take the research and bring into the real world.

So I started to go I’ve done a couple companies. The latest one I did, I was about semantic data virtualization and then sold the company to data dot world a couple years ago. That’s why I’m here. So I I I love to be the bridge between the the the the tech side and the business side and understanding what the pain points are, bringing them back into the into the engineering world, sometimes to the scientific world and trying to kinda close those loops.

Do that from a development perspective. And then in the last decade, I’ve realized we gotta do this for the data side too. Right? So I think that’s what one of my interests.

So, yeah, let’s just kick it off. We’ll look what what what do you wanna go talk about? There’s so much.

Well, one one thing I wanna ask you based on what you just said.

We’re we’re get we’ll get into our core topics, but something that interest me that after what you just said, which is this notion of being a bridge, right, between technical and business.

If I was a business kinda centric CDO, if I kinda come up from the business side, you know, CFO, maybe product side, maybe sales side of the house, what’s what’s the one thing that I really maybe don’t understand about engineering types?

Right? Like, what or what’s the one thing that I should really know about engineering? You know, I’m I’m managing a large team now. I’m a CDO. I’ve got data scientists. I’ve got all these folks that what is the one or two things that you would advise really kind of getting to understand about about those folks that that would drive competitive advantage for a CDO?

Well, I would say that you would actually they need to have people in your team who can who can be those bridges.

So so, I mean, you’re not going to be able to go understand everybody and their and and their mindset and stuff. Yeah. They’re gonna be very rigid. They’re gonna be thinking about I mean, as a technical as a computer scientist, for example, I look at things as inputs and outputs.

Right? I look at things as abstraction layers, and I’m in I’m I’m I feel comfortable in a level of an abstraction layer and not in others. Other people may be interested or they feel comfortable being a compiler. I can I can move between different abstraction layers?

Right? So there I mean, that’s the mindset that we have as a computer scientist, as an engineer right there. So give me the requirements.

This is what you want. This is what I’m gonna give you.

But I think that’s part of the the the gap that we have, and I think what you really need is to is to have people on your team who can be able to go fill that gap. And, you know, as as as a leader, I think you’d you wanna understand kind of the overview of things, but you just wanna be empowered by having the best people around you. Right. So that that’s what I would suggest.

People like you. Got it. Who can sit in the middle and have a conversation with a business person or with a data scientist.

Okay. Let’s dive into it. So Right.

Without further ado, topic number one, the data mesh. This is something that was, I I would say, probably the number one topic of conversation at the this industry event that we went to a couple of months ago. It was all over the agendas.

Most certainly, a lot of the vendors were talking about it. I was blown away by how many I’ll just say loosely catalog centric, for lack of a better phrase, but catalog centric vendors that were there that were, I mean, pitching their tent around the idea of the data mesh. I’ll be honest. I’m a little bit of a mesh contrarian.

I believe in it, by the way. So I I I love the notion at a high level of this of, you know, data as a product. I I love the notion of decentralized data management. So there are some things about the data mesh that I really like, but I’m really concerned that we have a situation of architecture driving business.

Mhmm.

What would you say to what I just said?

No. So I think, first of all, it I I think we’re on the same page. The and let’s let’s see if we let’s see if we are on the same. I think we are. But let let’s see where let’s see where we start diverging here. First of all, data mesh is a social technical paradigm shift.

It is not something you buy, and I repeat this over and over again. If somebody is selling you a data mesh, please run far away and as fast as you can from the from that vendor because they are selling you BS. Right? So that’s number one.

So you something you can’t buy. And it’s as and as it’s a social technical paradigm shift. And the thing is that we’ve always been focusing on the technical side, and we’ve not as a as an industry, as a community, as technologists, we don’t like to speak talk to people. We don’t focus on the social side.

So that’s a big shift that has been occurring.

And then for me, the two most important things when it comes to to data mesh. Yeah. They’re all the principles. Definitely and I agree all the principles and go read the book that Shamak has done and and I and I and I will public I publicly say this.

Shamak, I think, will go down in history, in the data management history as being kind of that inflection point of that person coming up with shaking up the world to say we need to start thinking more on the social side. Shamak will go down in history for that. And and and and and everybody will have opinions about this, and and it’s great. Two important things that comes out of this, in my opinion, my perspective.

One, bringing product thinking into data. We do product thinking into a lot of things. We do product thinking into software. We do product thinking in how I do want to go design this really nice YETI water bottle.

There was product thinking that goes on and talk about that stuff. How about we do that to data? I don’t think anybody will disagree. And, you know, actually, if you disagree, I don’t wanna work with you because I think you’re on the wrong side.

Number one. And second, it’s understanding that balance of what needs to be centralized and what needs to be decentralized. And there is a balance, and that balance depends on the culture that’s in your organization, on the size of your organization, on the industry that you’re in. And that’s going to evolve.

And it’s and and you’re not gonna go switch one day and start. So I think those are the two things, and everything I just said, it’s not around you don’t need technology for that. Right? It’s it’s just I mean, that’s the social side about it.

Now let me go kind of clarify. What I I don’t need when I see it, you don’t need technology. You need technology to go start. You need technology is definitely an enabler for all these things. And I think this is one of the things that we kind of as technologists, we struggle is, like, we always think about technology first.

And I think that’s the problem if that you’re going into this mindset of data mesh and going technology first. And my point is go social first, and, you’re gonna get uncomfortable because as technologists, we don’t like to talk to people.

So when you say social first, what form would that take? Does that mean you need to understand at a very intimate level kind of the cultural and political and maybe even kind of HR drivers of an organization? Is that really what you mean? Yeah. Yes. Okay.

My so so I when I talk to our customers, like, when I talk to prospects, I just talk to people in the hallways or just in conferences and stuff. I ask them, how what’s what’s your culture and organization about centralization, decentralization?

And I get three responses. Three types of one, oh, we’re centralized.

Oh, we’re very decentralized.

Or the other one is like, I’ve never thought about it. So if you are already very decentralized, it’s gonna gonna be less friction around that. If you’re very centralized, they’re gonna have a lot of friction on that if you think about it. And if you don’t you don’t know, well, you should probably figure that out. And how you do to go talk to people?

Well, my my concern there I’m not sure if concern is the right word.

But I think in many ways, the way that organizations behave may not be fully aligned to their strategy.

So when I was a Gartner analyst, what I heard all the time was a lot of decentralization, a lot of decentralization, but decentralization kind of by default. Right? Because of a lack of a focus on governance, a lack of, I’ll say it, leadership, a lack of overall business strategy. And what ended up happening is organizations kind of defaulted to to decentralization because they had to sell products and they had to deliver goods and they had to wow their customers.

And those are all very, very good things. But what I would see is is, you know, technology saying, okay. Well, we’re decentralized. That means we need a data mesh. Meanwhile, they’ve got senior leaders saying something slightly different at higher levels. Yeah.

No. That that’s a great point. I think the whole at the end of the day, this data stuff, treating data as a product, centralization, decentralization, this is not the goal. This is a means to the end, and I think that’s something we need to understand.

I I I I publicly say this. I think a lot of the the data teams were on they’re on this high course. They think that they’re the greatest greatest, latest, greatest thing. And I’m like, you know what? You’re not actually, you’re not. Get off. Don’t you don’t think you you have a big ego right now.

You need to understand how the organization is making money and saving money, and you need to be able to connect your work as directly as possible to how the organization is making money and saving money. And if you are not able to express that, you’re not you’re not, you’re not being successful. Period. And I think that’s the shift that we need to go have. And and to do that, you just need to understand how the business works. I I I’ve been talking lately. Like, we always say data literacy.

How about business literacy? How about making sure that that your data teams, your technical teams understand how the business works, how the data flows, how the money flows, and how is that connected with the people that you’re talking and they’re asking you for these things, for for whatever reports or whatever analysis and stuff. I think that’s the difference. And I think I did this I did this analysis a while ago with with with my buddy, Mohammed Asher. We were looking at just at at we looked at one of the largest largest companies in the US, just in the US on LinkedIn.

We We looked at the number of employees. They had, like, a twenty thousand employees in just in the US. How many of them had titles like data anal data types of titles?

Three percent. Oh.

So you think we’re so we spend this three percent lives in a bubble thinking they’re the greatest thing.

How are they actually helping that ninety seven percent?

What an interesting stat. Oh, thank you. Mohammed, by the way, super super sharp guy. Really looking forward to having a conversation with him, here on CDL matters.

Your perspective, I completely agree. Right? I completely agree with what problem are you solving. What value are you driving for the organization?

Totally, totally aligned.

On this notion of data as a product, another thing that I’m kind of seeing is and and the joke I I said on a different version of of of another podcast I was on is that you remember ever participating in these kind of these team building drills, right, where they they’ll they’ll put four or five or six people in a team, and then they’ll give them, like, a toilet paper roll, some duct tape, and some, you know, some pasta and say, like, go go build a a bridge.

Right? And and that’s kinda how I see what’s happening in in is is a phenomenon in the in the data as a product world. Meaning, the outputs or or the raw materials are just fixed. Right?

Like, if I’m if I’m the VP of product management, which I which which I actually have been, the first thing you do is you figure out a problem to solve. Right? And then and then you work your way all the way down to what are the materials that I need to build the product to solve the problem, to drive the revenue and on and on. But what I see with us data people is which we’re so prone to do is we start at the bottom with this kind of this set list of raw materials.

And it’s like, okay. Well, how do I duct tape some of this data together to productize it?

So you’re you’re nodding. Obviously, you you you agree. I mean, what what are you thinking as I as I as I’m speaking?

We look at the entire data landscape and look at it from a historical perspective.

Let’s look at this from the principles, understanding the inputs and the outputs, and they haven’t really changed. Right? So the first principle are removing data. In data comes in, data comes out. That’s your ETL type of thing. Your second one is I need to store storage some compute.

Data comes in, questions come in, answers go out. That’s your data lake, your data warehouse, your data lake house, your data warehouse, blah blah, all the, like, databases.

Then your other one is I need to use data. Data comes in, and then another types of questions come out come in, and then answers come out. That’s your analytics, your BI reporting, your your your MA, ML, AI, all that stuff. And this is historically been what we’ve been seeing throughout the last twenty plus years.

And now the modern data stack is taking that to the cloud, which now there’s less type of work that needs to get done. Click, click, click. You can get all these little boxes up and running. But what we’re seeing in the last, I would say, five years, right, definitely less than ten years, is that there’s a much more of these little boxes coming around. These only little categories, these features that are turning they’re becoming categories. And some of them are bigger and and kind of have more history, like, metadata management has always been around. Right?

There you have MDM has been one of those things that you would say, well, they can go inside of part of the the the the moving data, transforming data and stuff. And now there’s data data catalog. Right? The data observability, data quality, data ops, data blah, data data data a, b, c, d. Right? Now the thing is that we’re just we’re overwhelmed with all these things, and as technologists, this is cool. I get to go play around more tools.

But that’s not what the business needs. So I think that we’re just we’ve just been exposed to all these cool new tools out there, and it’s much more easier to generate tools.

Right? Now SaaS makes it such easier to go test around tools and stuff that we just focus on the technology. And I think the issue here is that we’ve been defining success from a technical perspective, which is, oh, my data is now in one place. Well, that’s not what that was not the goal.

The goal is I’m answering this business problem. You physically, technically solve the solve a problem, but how do we know that’s solving the business problem right there? So we really need to start shifting. This is the paradigm to the social technical side is define success from the social perspective, from the business perspective.

If you spend all your time and money and you built your beautiful data lake and all this stuff works with CICD and then the and the quality is perfect, but people can’t answer a question to go make money and save money, then I’m sorry. You’ve you’ve probably not been successful.

You you you pretty much just encapsulated most of my experiences with Hadoop.

Right? Which is like in previous company I was with, we spent a lot of money on implementing, HDFS clusters and and and and and building out kind of a a big data infrastructure.

And to me, who was on the business more on the business side at the time, the story that I tell is that all of those investments answered a whole bunch, provided a whole bunch of answers to questions that nobody was asking.

Right? Like, like, all of these in really interesting and and we never even knew about correlations. Right? Like, that all of a sudden okay.

Wait a minute. But nobody’s asking about that. Right? Is this is this money well spent?

So this notion of, you know, the the social and cultural and kind of business and and outcome driven perspective here is is is key, and I I couldn’t I couldn’t agree more.

And and and to the Hadoop point, it’s like, well, why did people jump on the Hadoop bandwagon? Well, because Google was doing it. And why did people jump on to the NoSQL thing? Because Amazon generated Dynamo. Google did did Hadoop and Bigtable itself. It’s like, guess what?

You’re not Google. You’re not Amazon. I know you wanna be. I know you think you wanna go be.

Right? It’s stuff. And and, you know, the problem is that you also hire a lot of people that come from those companies, and then they bring in that mindset, that culture, but that’s not who you are. I mean, I’ve I’ve I’m sure you’ve worked to work with companies, billion dollar companies, and their data still can fit on my freaking phone.

Right. It’s not big data.

Right. Right. So this this compartmentalization, right, the this this hyperfocus on features and functions and and making new things in the data world. You you listed a whole bunch of them. Right? Data observability, new new phenomenon in data, data ops, on and on. So so do you think that’s doing more harm than good?

It’s what I’m hearing you say. No. No. I I no.

They exist for a reason. There’s a technical reason for their existence.

Now are they are they gonna become the gigantic next category equivalent to a data to storage and compute?

Probably not. I mean, I think a lot of these things are gonna get they’re features.

Right? If you look at the boxes, I I would I would argue that there should be these three main boxes, and there should be at least a a next fourth box. And and and that fourth box, let’s just I’m gonna call it the glue for now.

That glue Yeah.

Yeah. I love the glue.

It because it it it it connects to that first box of moving data, connects to the second box of the storage and compute, connects to the third box of of using the data. Because there are things that you wanna be able to to to to understand how that’s going. I mean, so that’s metadata comes in there. Right?

But so so it’s a metadata box around that. Right? Because that those first those those first three boxes are about data being moved, and then you wanna have another box of the metadata. But then it’s not that metadata is connected to all these things because I wanna know, where what’s in that first box, what’s in that second box, what’s in that third box.

I wanna know the quality of the first box, the second. So so there’s all these questions you have. So, effectively, they’re features.

So right now, there’s not that fourth box. There’s, like, fifteen different boxes.

So I think we’re so what we so what we need to have is a lot of interoperability around this stuff.

But even if I have an operability, do I really wanna start connecting ten different tools altogether? So I think we’ll we’ll we will eventually start seeing more of this consolidation. I mean, it’s bound to happen.

But kind of my recommendation is look at it as the principles. Yeah. Understand what are the inputs and what are the outputs, and don’t get just focused on the features because I I will say publicly say this. I am very annoyed when I go talk to customers and prospects that go off talking about the features.

I’m like, do you know what your business strategy is? What is it you’re trying to go do? I always tell them, here’s your magic wand. That feature that you’re requesting, blah, it’s solved.

So what? What are you doing tomorrow?

And if they can’t answer that question, then then then then I mean, why are they so focused on that feature? Right. And this is kind of the shift that I talk. We need to go from this data first world and tech first world. So this is what I call the knowledge first world. People, context, relationships first.

You need to understand who are the people who are actually asking these questions and why they’re asking them and try and get and kinda push more. Ask why, why, why? And this is something that we don’t do on both sides on both sides.

That’s that’s that’s a great segue.

This kind of this notion of the relationship driven world, the context driven world, which feeds right into your background in your PhD.

Love to talk about that a little bit more. Just to tie off in the data mesh, what I think I’m hearing you say is you’re a believer, but you’re a believer because of the operating model focus. I would I would say kind of the the political and cultural and organizational things that you mentioned really kinda speak to the overall operating model of an organization.

And then I think what you’re saying is is that the increased focus that the data mesh is bringing to data leaders about the importance of those things is a is a good thing in total.

And and so I don’t wanna put words in your mouth, but I think that’s what I’m gonna say.

Add to that is if you are if you if you can if you bring in that focus from a social people perspective Yep. Then that’s that’s what we need. If you ignore the social people perspective, all you’re doing is the same thing over and over again and expecting different results. That’s Einstein’s definition of insanity.

And if if if if you’re just gonna go do, quote unquote, do data mesh, and you’re just gonna bring in a bunch of technologists to in into the room, and you’re not gonna understand who are the consumers and where are the problems and how is this gonna evolve and what is the road map and bring product thinking to that stuff. If we’re not bringing that in, then you’re just gonna go build the next the the next data swamp. What are we gonna go do? And and and and and my and and those are don’t work for those people.

Don’t work for that, Frank.

I I love what you just said. It actually ties very well to a conversation I had on this podcast a couple of weeks ago, so I would welcome listeners to check it out. It’s with, a a fellow PhD named Cheryl Flink. And Cheryl, speaks much about this notion of human centered design where kind of be taking kind of a human first approach, but you using the word social. Social to me is just a group of humans. Yep.

But but but that how important that would be to a CDO or any other senior leader, who who’s looking to solve for some of these difficult business problems. So love it. Let’s twist a little bit. You know, when when we were speaking in, in Boston, one of the things that I appreciated most about our interaction was, you know, I I I walked up. We said hello. And I I basically kind of pitched my MDM flag.

Right? And said, hey. I’m a big MDM believer. And and, again, I don’t wanna put words in your mouth, but you’re you’re what I heard you say was, man, that’s kind of old school. Are you sure?

Well, I I would say a lot. So one of the early episodes of cataloging cocktails was just Tim I mean, actually, the kind of the first thirty episodes of cataloging cocktails was just Tim and I talking. And one of the episodes that we had was about MDM. Is MDM dead?

So so who are you talking to? Oh, you’re you’re calling. Just Tim.

It was just Okay.

Early episode of the Hong Kong, we had no gas in there.

We just had two of us. Okay. And I asked him, so, Tim, what is MDM? He’s just it’s fancy data integration.

I would say I’ll never forget that. Fancy data integration. So I would I I think that we need data integration. For sure, we’re already always integrated data.

So MDM is really not dead. I just think it’s a it’s just the name itself has some old school baggage. We just need to go change it. So other vendors, I know they call it mastering or data mastering and stuff.

At the end of the day, we do need to be able to go interconnect data and have have ways of defining rules, and you wanna have machine learning approaches to be able to say, hey. These are these two records the same? They mean the same thing.

We definitely need to go do that. But I think with part of the shift is that we’ve come from this the the previous let’s say, the previous world is the single version of the truth. And I think that is what is going to start shifting. Because that because there there may be a need to have a single version of a truth. And if there is, that needs to have that centralized governance.

Now I think we need to be open to understand that the world is very complex and that there be multiple versions of the truth, and we need to accept that, and we need to embrace that complexity. And I think part of the business literacy that I talk about is when somebody says, I need informations about customers.

Oh, but what do you mean by customer? There’s so many versions of customers. It’s like, yeah. Okay.

What’s your version? Oh, my customer is when this is oh, well, I’m glad I’m not using your data because my version of customer is something else. And, and that would have been the wrong thing to use your data. Having that conversation and being actually proactive to say, no.

What version of customer do you mean? I mean, there’s other versions. Okay. Great. Or so I can’t use your version of quick for that for that context that I have.

That’s what business literacy is, is that you understand what this means. And and and then, yeah, we still need to have the the master data, the the the the reference data for what customers are and understand that there’s different versions of customers. So I think that’s what we need to go to. So kind of to wrap up here, when I say, oh, MDM is kind of that old school thing.

It’s like, I think, traditionally, we connect MDM with, like, technology of the two nineties, two thousands of single versions of truth. Yep. And I think that’s not that shouldn’t be the case anymore. I think it’s more about we need to be agile about how we’re defining things.

We need to make sure that it is going to be something that’s gonna get involved, that there that there’s a there there’s different context around that. We need to embrace that. And but, yeah, we do need to have reference data for sure. I mean.

Yeah. Well, couldn’t agree more. And you just use reference data, as a replacement for single version of the truth. Right? So there will could be a need for a single version of the truth, particularly reference data.

I I love your I love your partner’s use of the fancy integration. Actually, I agree.

It’s really fancy, but really complex integration where those business rules need a specific place to be managed.

I also fully agree in this notion of context centric. Semantic?

Am I using the the word the right okay.

Semantic MDM or or some sort of well, I would argue MDM is a semantic layer because it only Yeah.

I will.

Def definition management.

But but what I’m hear what I heard you say is that there should be room for multiple versions of the truth even if there’s a single source for those different versions.

I I would say that there is still an architectural need for a single source, one place to go get these things. Because when we’re talking about master data, we’re not talking about a lot of data. Right? We’re not talking about terabytes and terabytes.

Persisting that data, replicating that data is not a ton of overhead for the average organization, even though I still talk to a lot of data leaders who have this kind of this antibody response to replicating data of any of any form. It’s like, oh, no replicating. We don’t wanna do it. With MDM, it’s a pretty light footprint. But I do love the notion of context centricity.

Meet meaning one group should be able to see one thing and another group should be able to see something else. And that’s where kind of graph would come in, what Gartner would call augmented MDM, more semantic expressions of different use cases within the MDM realm. Do you do you agree?

Oh, let let let’s break this down a little bit because I think we’re starting to kind of throw a lot of a lot of the a lot of the buzzwords.

Buzzwords.

Yeah. Yeah. I’m falling prey to my own. Yeah. Yeah.

Great. So so so, we have kind of your your the facts, the observation that exist. And these are things that, you have an order management system. This is literally where somebody make a click and purchase something like this.

This is this is a fact. This occurred. Right? This is an observation of the real world that happened.

Right? So that’s the the the source of the and that that data is source right there. Now you wanna make an interpretation about that.

And those interpretations, we need to then understand what those concepts are. Oh, there is a customer.

The customer places an order. The order can have order lines. Those order lines can is about a product. The the order is shipped to an address.

That customer has a shipping address, has a billing address. Right? The order has a stat like, so that definition there of that scheme of that that’s the semantic layer. That’s the ontology that you wanna go do.

Yeah. And then what you want is to say, well, I want to be able to go define I I I wanna understand what that model is. And then I want to connect that to where the original source of the data come in. That could be your ERP system, your CR I mean, CR like, because those are the facts.

And when you make those connections, those connections are those mappings. They’re those transforms. Those are the that’s where you would have these rules where NDM would live. You’re like, well, I couldn’t make this transform this mapping this way, this other way, this one way.

I think the expectation is that there’s there could only be one way of defining this mapping from the source to the target.

My point is for some things, there may be only one way or there should be one way because of how the business is defined, because of regulations. But for other port purposes, depending on the context, marketing sees users a different way than sales sees a different users, and the customer CS team can see different users, those mappings can be different. So we just need to be very explicit about that. So and, again, this isn’t new.

I mean, we have a source schema. We have a target schema. We have mappings. And I think that’s for me, thinking about the source schema, this target schema, the mappings, that’s where semantics come into play.

The mappings, the rules is what actually provides grounds that semantics to the data. You draw something on the whiteboard, those bubbles and lines, that model, that’s semantics right there. That’s what I that’s what Juan go ask Juan what what how our business works. I’m gonna go drive bubbles and lines.

Go ask Malcolm how he thinks about the business works, bubbles and lines. Now actually compare them, and we can see where we overlap and we don’t. That that that’s our different context. And and and what we wanna go do, like, that that knowledge that we just drew on the whiteboard, connecting that to the actual data, which is the observations, the facts that occurred, those mappings, those tools, those transforms, that’s how we ground the semantics into the data.

And I think we do this a lot implicitly, but and we need to make this explicit right now. And I think this is the this is the shift that we need to start thinking about.

So I’m paraphrasing you paraphrasing you here.

But, basically, what I heard was data mesh, fantastic, awesome. But to do it effectively, you’re going to need this semantic layer. Right? Same thing is true with the data fabric. I would argue it’s even more true with the data fabric, which we really didn’t talk about today, maybe separate topic for separate conversation.

But whether it’s data quality, whether it’s MDM, whether it’s some of these kind of older school, arguably, legacy. I hate to word use the word legacy or, you know, foundational concepts. They’re still there. Right?

Yeah. They’re still they’re still there. They’re not going away. It’s just we need to be open to different interpretations, different use cases, different value props.

Couldn’t agree more. The challenge there, though, one, again, maybe our next conversation, the challenge there is the governance aspect because what you just said are a whole bunch of conditional rules, a whole bunch of if thens. Right? If it’s this, then it’s this.

If it’s this user or this use case, then it’s this. Is this mapping? Right? There’s a lot of conditions there that would need to be defined and managed as a part of a governance program because that to me, that’s that’s those are policies.

Right? That’s that’s that’s governance straight up.

But but I I fully agree, and I think this this is the mentality that we need to start figuring out on on balancing centralization decentralization, and we need to decentralize a lot of these things. So there are some of these rules and definitions that need to be centralized.

Things that, I I always give these examples. One is on on the on our podcast, we had, the the chief product officer of Matillion, Kieran. And he said that Matillion defines a customer, Very explicitly centralized. A customer and Matillion is a user who has been on their platform for thirty six days.

Not thirty five, not thirty seven, not thirty, not one, thirty six days. They have their reasons behind it. That’s that’s the rule. Everybody in the company uses that rule.

If you just you don’t use that rule, you’re wasting people’s time.

So that’s one thing.

You may have to go define what PII means because that’s for regulatory purposes. You have to have a mandate saying PII means sensitive sensitive data about names and and and Social Security numbers, whatever. What is because PII is different for for, for many organizations. Now you can define it what it the what is the global thing for the organization, and then other all departments, all domains need to reuse that, but they can go extend it saying, hell, we got even more sensitive data, so we’re gonna extend our definition of PII.

What is a phone number? A phone number needs to be very specific about things, and we need to have a very specific phone definition of a phone number because that’s how we’re gonna go do customer, service. And and we’re super customer service focused, so we need to make sure we have the perfect phone numbers, and here’s your definition of a phone number with the permission that the phone number needs to have.

You need to go use that. Now other places, other domains can go say, that’s not centralized. I need something. I’m gonna go create it.

That’s fine. I’ll be explicit about it. And I think when you think about the kind of the data product, something that we’ve been talking about, is this this data product ABC framework. Accountability, boundaries, contracts and expectations, downstream consumers, and explicit knowledge.

C for contracts.

I am going to I’m representing this data product. I’m gonna tell you what are the contracts around these things, what and what you should expect.

This I’ll make this stuff explicit. Now when you’re often looking and searching for data products, you’re like, I now know what I expect. It’s very explicit to me around these things. So if I’m asking for if if I’m expecting other things, like, you’d go read the freaking manual.

You didn’t that so I think that’s the that’s the balance you have to go have. And at the end of the day, if you consume the data and it doesn’t satisfy your expectations, but the expectations were specific, were clear, then it’s a consumer’s fault. And And I think and the way this is this is the balance we have to go do. And when peep you said something else that you brought up something and people said, oh, I don’t want to replicate data.

I’m like, it’s true, but guess what? You think there’s only one water bottle on Amazon that can go buy? No. There’s thousands of them.

It doesn’t matter because the best get ranked first, and those are the ones who will survive. And that’s the the the so this is embracing complexity.

I like the framework, the a, b, c, d, e, and I like the contractual, aspect there. Different kind of twist on on what a contract would be, but it’s it it is is it you could use the word policy, I think, but it wouldn’t fit to your anyway, it wouldn’t fit to your TDE.

If Paul if Pete was right in the middle there, but, I love it. And I I think that maybe there’s room there even in the future to be talking about contracts being managed in alternative technologies, like, maybe even a blockchain. Separate conversation. Maybe we can talk about that next week when we’re both at Gartner because we’re both gonna be there.

On that note, Juan and I need to go and start getting our beauty sleep, and we need to start, you know, eating some protein bars and making sure we have good nutrition in advance of the three day grind next week in Orlando of of of talking with folks and having data discussions and seeing amazing present presentations from amazing presenters. So, Juan, thank you so much for your time, but I really, really appreciate it. Fantastic insights around the data mesh and around data products. Look forward to seeing you next week.

Same thing, Malcolm. Thank you so much, and look forward to have it keeping this conversation going.

Sounds good. Thanks.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.
Malcom Hawker - Gartner analyst and co-author of the most recent MQ.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.
Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic
The Profisee website uses cookies to help ensure you have the best experience possible.  Learn more