Culture
Data Management
Data Professionals

The CDO Matters Podcast Episode 103

The Metadata Problem Nobody Talks About: Context, Catalogs, and the CDO’s Next Land Grab with Ole Olesen-Bagneux

X

Episode Overview:

The Metadata Problem Nobody Talks About: Context, Catalogs, and the CDO’s Next Land Grab

Every organization is told they need context for AI to work. Almost none of them know where to start. The answer has been sitting in their metadata all along — but most CDOs haven’t connected those dots yet.

📌 In this episode:

  1. Why your data catalog is already a source of context — and how to expose it to agentic workflows today

  2. The semantic layer problem: definitions living in BI tools instead of governed catalogs, and why that creates risk

  3. Why engineers and library scientists think about language completely differently — and why that gap matters for AI

  4. Malcolm’s argument that CDOs are in a unique position to make a “land grab” on unstructured data before someone else does

💬 The takeaway: “CDOs are in a unique position to make a land grab: the ontologists, the taxonomists, the library scientists are all over the organization without one single leader. That’s your moment.” — Malcolm Hawker

About the host + guest: Malcolm Hawker is a former Gartner analyst, Chief Data Officer at Profisee, Editor-in-Chief of CDO Matters, and host of the CDO Matters Podcast. Ole Olesen-Bagneux is Chief Evangelist at Actian and O’Reilly author of Fundamentals of Metadata Management (2025) and The Enterprise Data Catalog (2023) — the practitioner’s reference for anyone building or governing a data catalog.

Subscribe to CDO Matters Monthly https://profisee.com/cdo-matters-community/#join-community

Episode Links & Resources:

 

Good morning. Good afternoon. Good evening. Good whatever time it is, wherever you are in our amazing planet Earth.

 

 

I am thrilled to be joined today by old Olsen Bagnall. Ol’ is a friend of mine. We met a few years ago in person in Austin. Ol’ is an expert on all things metadata and data catalogs.

 

 

And we’re gonna go deep on data catalogs, metadata. We’re gonna talk a lot about context which seems to be the thing that everybody is is talking about.

 

 

Before we do any of that, let me let me remind our our our listeners. If you get an opportunity to go to a conference that Ola speaking at, interact with him somehow, some way, I I would absolutely recommend that because he’s one of the smartest people I’ve I’ve ever known or or spoken to in the world of data. So you will be enriched through a conversation with him.

 

 

But I would I would invite you to get his name correct. It bugs me, my friend, that so many that that it almost bugs me that it doesn’t bug you that people mispronounce your first name, like, constantly.

 

 

And if people were mispronouncing my first name constantly, I don’t I don’t know. It would drive me bonkers. People misspell my name all the time. So, Ole, with that, welcome to the CDO Matters podcast.

 

 

Thank you. Thank you, Malcolm. It’s a pleasure. Yeah.

 

 

Thank you. And, you know, I get I just know you, you’re probably not gonna even talk about the issue about your first name, but I would invite everybody to say it correctly.

 

 

I mean, I mean, I just just for me, it’s just not I I I’ve been used to it all my life, so it’s not it’s as if, like, if you have I don’t really know what to compare it with, but if you had a I don’t know, if you had a weird car, for example, and everyone you met say, oh, that’s a weird car. And, yeah, but I’ve got it for, like, forty two years, so it’s just my car. That’s just how I might feel about my name. Obviously, it’s it’s it’s difficult to pronounce, and everyone feels a little ashamed, but, you know, there are bigger problems in the world.

 

 

Oh oh, okay. Okay. We’ll we’ll we’ll let it go, but I’m not letting it go. I’m I’m evangelizing.

 

 

This is what we do. Ola and I have our our our interesting and kindred spirits because we actually have the same job, and it’s not a very, very common job. There’s not many of us in our world that that have this job, but Ola and I are are both, in essence, evangelists for our companies. Olin works for software company Acteon.

 

 

Of course, I work for Prophecy, and we get these amazing jobs of going to speak at conferences. Our companies let us write books. We’re gonna talk about your books in a second here.

 

 

They let us go and interact with with the data community writ large. So I I don’t know. What do you I I think I’ve got the best job in in the world. What what do you think about your job?

 

 

I have to agree. I mean, I think it is a pretty fantastic job.

 

 

So it’s a privilege and an honor to for me to represent the company.

 

 

Basically, I don’t think I could do it if I didn’t believe very strongly in the technology and the vision, the road map we have for our technology. I think that is a absolute key element in such a job. Right? If you were to evangelize something that you didn’t believe in, then I think it would show very easily and it would like, at least for me, and then I just wouldn’t be very good at it.

 

 

So so for me for me that is very, very closely connected to what I do in this particular job, because prior to this job, I’ve been an architect and a a director for data management and information management departments. So so it’s really not so so for me, this job is not some I guess, for you, it’s the same. Right? No one really graduates with the intention of getting this kind of job.

 

 

It’s it’s really rare. Right?

 

 

Yeah. It’s it’s it’s extremely rare, and I appreciate what you just said about, you know, a a belief in the company and a belief in what we’re doing, and I I have the same thing. I I I believe in my industry. And, yeah, I I I believe in it.

 

 

Always have. We’ll continue to. But I think the thing that that one of the things that draws me to your insights and draws me to your content and to your books is is the fact that you you are an expert. Right?

 

 

You are you’re not an influencer. I I kind of wrestle a little bit when people say, oh, well, there’s Malcolm with twenty something foul followers on twenty thousand followers on LinkedIn or whatever it is. And, you know, you’re you’re an influencer. And I’m like, I that makes me, like, no.

 

 

No. I’m I’m not an data person. Born and raised in this world and have done the job and just here to share what I know. And I know you’re in the same position.

 

 

You you wouldn’t consider yourself, like, any any form of influence, would you?

 

 

No.

 

 

There are people that have, like, encouraged me to embrace that term or or that, like, title, but but I I really can’t. I mean, I think the argument would be or or the the the best argument would be that you have influence, and therefore you are an influencer, whether or not like it or not.

 

 

The way I see it is true influencers as you see them on social media in general and not on like serious social media like LinkedIn is someone that’s willing to do whatever for reach, for visibility, and obviously for money. And so politicians have a lot of followers also, professors have a lot of followers, authors, like fiction authors, like, you you get many professions that have a lot of followers, and and so I don’t see a lot of followers as something that equals you being an influencer.

 

 

I I I see the dynamic quite differently. But on the other side, I also feel that you just have to kind of embrace that you live in a world where you can communicate on social media and you have to do that and if you don’t like it, then you will face difficulties in in disseminating your ideas. And so so I have a set of principles that guides me on my behavior on social media.

 

 

I don’t know if you wanna dig into that or like Yeah.

 

 

Yeah. That’s cool. And and the reason why is because if you’re maybe a little younger, maybe you’re in your thirties or maybe even in your forties, and you’re worried about AI, and and you’re worried about AI taking your job, then my recommendation, at the risk of sounding a little egocentric, is to become like me or become like old. Become an expert.

 

 

Find a way to have influence in your industry because if you can do that, yeah, I won’t take it. Yeah. I won’t take that job. It’s not gonna take my job.

 

 

It’s not gonna take old job. Right? Because we’re real humans with with real ideas as as some degree of influence. So, yeah.

 

 

I’d love to hear your principle.

 

 

Yeah.

 

 

I agree on that. I think domain expertise is something that will never go away, and I definitely encourage everyone to study as hard as they possibly can.

 

 

I I think look. So okay. So for my behavior on social media, I follow a set of rules that are both aligned with the metrics of algorithms and my personal ethics.

 

 

And so I certainly don’t want to preach my personal ethics, but it’s just what gives me peace and what I, like, makes me sleep at night, because I wouldn’t be a person that would interact aggressively or even passively aggressively with my followers, readers and friends on social media. I very strongly believe in not doing that. I believe in only learning, expressing curiosity and also expressing what you don’t know, asking very open questions, not dumb questions obviously, but but open questions that also exposes lack of knowledge of something and really use the social media that I am on, it’s primarily LinkedIn and and to an increasing degree also Substack, just use it to to explore people, ideas, learn, exchange views, but I have a principle that is, and I like I can go, I have a like, I’ve written it down because people ask me and I it’s no secret, certainly don’t want to keep it for myself, so I can certainly share it, so it’s a long long list, but it it really boils down to the fact that I behave very politely, I only send out positive energy, I step away from discussions that go in directions that are condescending, aggressive, any kind of negativity, just I just step away from it.

 

 

I don’t want to win those discussions. I let other peoples have the last word, I I salute them, I give them a compliment if that is what they want and I just step away from it. And that has made me very very peaceful on social media for a period of time that I think is about ten years now.

 

 

So I I always sleep at night. I I’m never kept up by these, like like, if you get into, like, really toxic trolling, all of that, I I’m not I’m just not in it. I’m just not in it, and I’m not for it. And I don’t think it helps anyone with anything.

 

 

So Well, it doesn’t It it does it doesn’t help anybody with anything.

 

 

Something that I wanna just follow-up on and and reiterate and stress. You can be extremely, extremely knowledgeable. Right? You can be an expert.

 

 

You you can know everything you think there is to know. But if you lack curiosity, if you lack the growth mindset, you lack what Olin just described as as a desire to learn, Then you come across as a know it all, and you’re not open to any new ideas. And spoiler alert, you don’t know it all. There’s no way anybody can know it all in this world.

 

 

So I can be pretty opinionated. I I know I can, and I can I can be pretty forceful, but at the same time, I’m here to learn? And and and if and if you follow that, that just just let’s let’s go learn. And there are people that you can learn from from.

 

 

I learned from old. I hope you learned from me. I learned from many people online. And you know what?

 

 

It’s free.

 

 

So that’s Yeah. That’s pretty cool. So let’s let’s transition into into this world of context. I was at the Gartner Data and Analytics Summit in Orlando, and it was a pretty hard shift. I know. Talking about principles and ethics on on on social media and going into context.

 

 

Maybe I should have had more context in my query about context, but I was at the query the data Gartner Data and Analytics Summit in Orlando, and the word context was everywhere. It was just it was it was everywhere. It’s in every overnight, it seems like, and I know it’s not overnight. It just felt that way because it was so abrupt. What how would you describe this phenomenon? If I’m the CDO and I’ve been doing data management and I’ve been I’ve been supporting my organization, doing the dashboard, doing the BI, doing all the pipelines, managing pipelines, data quality, data management, integration, all those things, and all of a sudden everybody’s talking about context, How would you explain that to that CDN?

 

 

Yeah, so the reason why this change of topic topic made me laugh a little bit is because I guess this would be one of the topics where I would have difficulties in, like, staying calm. Right?

 

 

Always calm. What do mean?

 

 

Because it’s like this this word is is, as you’re saying, it’s everywhere, and I think if if I was to so my view on that my view on that is is quite simply that I see there are two readings of this word in this time in the data and AI community.

 

 

If if we pitch context as something that is new, that is like a secret sauce to make AI work, then I think we are wrong.

 

 

I think I think context context is a very old word and and there’s nothing wrong with the word context in itself.

 

 

And I I also embrace that we focus more on context these years.

 

 

What I I I wouldn’t encourage is that people think of context as a new discipline, a new concept, a new way of working.

 

 

The semantics that fuel what we call context these years has been a struggle for companies for all of our lifetime, and I think also for people that are younger, older, sorry. So so I think the right the right way to approach a word like context would be to say very pragmatically, okay, so what is it we have that we can surface in our technology stack, in our methodologies, in our ways of working in general, in our educational background that can fuel a context for AI? I think that is a fair fair way of discussing it.

 

 

But under the hood, context in itself is a is a is a very old term. Right?

 

 

And it comes with nothing new in terms of what that term means.

 

 

Well, yeah, I this is something that I kind of struggle with a little bit.

 

 

You know, we’ve always had context. Right? Whether whether it was a join between tables or whether it was just attributes slash metadata. Right? The the the metadata has always allowed you to infer a reasonable, I would say, a reasonable amount of context.

 

 

So then my mind goes to, okay, so if it’s if it’s something more than that, and I and I think it slightly is more than that. I think it’s I think it is it’s it’s not the inference of context, but it is the declaration of context that exists potentially in things like knowledge graphs and and and other technologies like that.

 

 

Do you see what I’m do you see what I’m inferring? I mean, context has always kind of been there. You’re an expert in metadata.

 

 

How would you how would you answer to somebody who said, okay, well, that’s not enough. I need more. If if if that’s true, what would you need and why?

 

 

Well, I think actually that would be a very fair response if they need more because context basically is also expressing a situational setting. Right? You walk into a room, you say something, and it can be out of context.

 

 

It can be out of context because you’re unaware of the people in the room that just had a debate about something, you walk in, you say something and it’s very positively received, it’s out of context. So so everything so the tacit knowledge that has that an organization has is context. And and so so I think just expressing context as something that is resides between the data and the metadata layer, that’s a very data centric way of approaching context as I see it, and and I don’t think that is context only, but there is certainly context in there also.

 

 

But a context discussion should focus on many of the elements that we find in organizations at large. Right?

 

 

And and that is also if we’re looking into what a into what a, I guess, what an an agent in in these days, so an agent after the generative AI boom, new agentic architectures emerge and has obviously an understanding and a need for context.

 

 

And that context obviously is extremely important to capture and to expose energetic architectures so that they can behave better, understand the context. And so I think that obviously, that opens that opens basically the need for viewing the totality of data as something that is more than just the basic structured data of a database. I actually think that is it’s a little harsh, but that may be the least important data in your company now.

 

 

I I Maybe. Yeah. I would I would maybe not because I would describe these in in two different ways.

 

 

I’ve got a presentation I’ve been giving this year and how I describe things related to context, knowledge graphs, ontologies as as things that describe the world of meaning. Right? That provide meaning.

 

 

And the stuff in the databases provides measurement.

 

 

Like Yeah. Like, we need measurement. We we we always are going to need measurement, and I think they’re I think they’re co partners in all of this. Meaning and measurement.

 

 

Forever and ever, all we focused on was measurement because that’s what we had to do. We built dashboards. But now now we’re being asked provide meaning. What does this actually mean?

 

 

You know, what does this data mean in context? You you had talked about surfacing context in in agentic workflows.

 

 

How how are you seeing people actually do that? Is is it in knowledge graphs? Is it in ontologies? Is it in how how exactly are agents consuming this this context?

 

 

Yeah. I mean, there’s a variety of architectures that cater for that, and the technical evolution suggests that some of them will not live as long as we might have anticipated one to a year ago. Right?

 

 

But, basically, I think so so that’s a broad question, but I can say that it’s not that I can’t answer it, it’s just that there are many answers to it. Right? I would say that any kind of metadata structure holds value in this situation. So obviously, I would highly recommend using, discovering if your organization has some kind of ontology or knowledge graph, or most likely multiple knowledge graphs, ontologies floating around in various departments.

 

 

If some of those exist, they are highly usable. You can also you can also just a sec here. I just need to chart my laptop.

 

 

Well, this easy.

 

 

That’s it. That’s it. So so basically, every kind of metadata structure that your company holds would be of value.

 

 

The data sources themselves are also of great value and where we in very in a very recent past focused only on structured data, tabular data, we now have a very big potential in what I guess the data community, the data AI community calls unstructured data. Pictures, text, sound, all of that now holds tremendous value to train to train or set context for for authentic architectures.

 

 

So there’s no one there’s no one answer I think is is what you’re trying to suggest. The reason why I ask is because when I was at Gartner, I I enjoy sitting in in the large lunch and breakfast areas and just asking people a lot of questions.

 

 

And I did that over three days. I always do. And I would ask questions like, what’s your key takeaway? And across the board, the number one takeaway was, I guess I need to go do context. I just I’m not sure what that means.

 

 

That’s that’s what I heard over and over again. Like, what is what is that actually even mean? I guess I have to go figure that out and do that, but I don’t even know how to like, where do I start digging? Right? Like, what do what do I what do I start doing?

 

 

Well, it might so I think we’ll both be maybe this will yeah.

 

 

After, but I think we will meet in Gardner I’ll be We will.

 

 

This may actually come out after, so there’s a bit of time traveling going going on. We will have already been there, but, yeah, I will see you at Gardner London in in about a month.

 

 

I look forward to it, and I think increasingly the last couple of years, Gartner has been more and more precise in its in previous years, I’ve always been like, when I’ve looked at Gartner’s predictions and like the way they describe thing things, the technical evolution, but even though it may be difficult for the attendees, to to to truly get what context is about, and I completely understand that, I think it’s correct that we need to focus on this, and I’m very thrilled actually that I’m focusing on it. That’s not to say it’s easy or like, obviously, every single word you use and put a focus on has the risk of becoming a buzzword, I guess context has already become a buzzword. Yep.

 

 

And and maybe maybe it will also fade fast, I mean that has a tendency to happen with the words we focus on in in tech, right?

 

 

But it’s only in our little bubble it’ll fade, because in general context is a word that is like, hundreds of years old and and it’ll be used by everyone that, like, has studied language or history or sociology in ten, twenty years from now or a hundred years from now. So so it’s only right now that you’ll see a peak and then perhaps decrease of it, but I don’t I embrace the focus on context, on semantics.

 

 

I just don’t think that we should have the illusion that there is a silver bullet to this problem. I don’t think there is one and one only technology to solve this problem or one methodology to achieve the ultimate context. Those kind of approaches strike me as a little naive and and also a typical way of expressing your views on language if you are an engineer, if I may be as direct. Yeah. Because it’s obviously I think many engineers, many data engineers, many, like, people in tech, they come from an engineering background, a computer science background, and they have not been trained in thinking about language as something with multiple meanings and the real the real challenge of of syntax and semantics, which is when you go out of the realm of programming languages and turn to real human languages, something way more complicated, way way more complicated, and that is where the context is. That’s why we’re seeing this clash and also the need to focus on it.

 

 

It’s it’s interesting you should say that because I’ve I’ve been trying to do some research in in this space, right, in the in the world of broadly known as knowledge management. Right? And had the opportunity to give a keynote, the knowledge management world summit in in in DC at the taxonomy boot camp last year.

 

 

Getting increasingly exposed to that world.

 

 

And one of the ironies is that we have completely different lexicon.

 

 

Right? The world of knowledge management, they they use words to describe things. You’re talking about semantics. You’re talking about kind of languages and intent.

 

 

That world is describing the exact same thing that native people are describing, and we use different words to describe the same thing across this world. It just blows my mind. Like an attribute versus a a facet. Right? Or a controlled vocabulary versus a glossary. And we can we can have interesting academic arguments about are they really the same, but they’re slightly different.

 

 

But there’s a there’s something there that has been really, really reiterated by me using this thing called OpenClaw and getting familiar with like the use of markdown files to control the behavior of my agents.

 

 

Getting back to and and that whole world like and and creating in in essence a file structure and creating in essence an ontology or my OpenClaw bot has been very different in my brain than it was like to model data and to build some schemas. Right? And and and to join some tables. Like, two very different ways of of structuring data and thinking about data that you just that you just touched on.

 

 

And and I think that having more people in our space with library science backgrounds, with knowledge management backgrounds is gonna be an important thing over the next few years. Do you do you agree? Do you see that as well?

 

 

Yes. I mean, I do. I do. And I think I think think some disciplines need to to to merge somehow. I think we’ve we’ve also been discussing that on on LinkedIn and and obviously also in real life. Disciplines need to somehow merge, how that will happen and who will come out as, like, reluctant to say the winners.

 

 

Yeah. I think, like, what what what I mean with this is that up until data up until generative AI, basically OpenAI, ChatGPT three point five, data community received an enormous amount of funding from VCs to whatever little component you would have in a data in a modern data stack, right?

 

 

We’re not talking more than two, three, four years ago.

 

 

So so what whatever, like, however small that small capability would be, you could you could like separate it out from a bigger platform and claim a technology category in yourself and say this is a this is an element of data engineering and we need funding and this is what it will do and it’ll like increase your capability to be data driven as a company, and then it would receive VC funding, and it would receive a lot of attention, books would be written about these things, and then OpenAI happened, like JET GBT three point five happened, and all of that ecosystem combined with a lot of geopolitical things and financial prioritizations across the planet, all of that kind of just imploded, there was nothing left of that. And what we’re seeing now is that all of that VC funding combined with COVID, WAS, geopolitical change overall, it’s just like derailing all of the strategic funding towards other categories in technology.

 

 

And that makes me think that I’m not sure what used to be the data community and now the data and AI community will come out on top of all this.

 

 

I’m not sure about that because I can’t resist.

 

 

Okay. Yeah. Go ahead. Because?

 

 

Because I think that the authority that this community used to have is not as authoritative anymore.

 

 

And so when you’re seeing knowledge management and data management and information management getting closer together because they have realized that they need to work together, apologies, to deliver that infamous context for AI, then I’m not sure whose methodologies and whose technologies will actually dominate that conversation.

 

 

Okay.

 

 

I can’t resist. I was I was gonna go deeper on the how do I how do I build context?

 

 

But what you just said, I I can’t I can’t resist. Because what you’re talking about is something I think a little more fundamental.

 

 

And to me, what that what you just said is the for lack of a better word, evaporation?

 

 

Maybe a bad word. I I can’t find the right word speaking of semantics. But but is is the lessening of the divide between the world of data and the world of business application.

 

 

Right? Forever and ever, data was was this modeled interpretation of business processes that were captured in systems of record, like CRM systems, ERP systems, or even transactional systems like a like a like an ecommerce platform. Like, we we in the data world tried to model those worlds as best as best we could, but we had a very specific outcome in mind. That outcome was measuring things correctly.

 

 

Right? It was never it was never necessarily about process optimization per se. It was it was never explicitly, I’m modeling this data specifically to optimize this process. It was I’m modeling this data so I can measure it consistently over time.

 

 

Right? What you have suggested to me in in my brain is that as we move more towards the world of of using our insights, like the big data and knowledge people, using those insights as we move more and more towards process optimization, we will inevitably feed CEDE that we will inevitably give up ground or give up authority to the worlds that are closer to process management and not data management. Did I did I am I paraphrasing you correctly? Do do you see this? And is am I making any sense here, maybe? Sure.

 

 

Think to to to the extent that to the extent that process management is is also containing unstructured data, I I think I think I can find common ground on that. I mean, what what I’m seeing is and Bill Inman said it himself. Right?

 

 

Ninety percent of of all data in a company is unstructured. Like, companies run on documents. Yeah. They run on files.

 

 

They don’t run on data. I know it’s they don’t run on structured data per se, that’s what I’m saying. They run on data. Obviously, they run on data.

 

 

But they run on unstructured data, and many of the people working in these disciplines wouldn’t call it data, they would call it text, they would call it images.

 

 

If you work in a lab, for example, and you take a look the history, they would they would call it something else than data.

 

 

But it is, like, technically speaking, of of course it’s data, but what I’m saying is that the disciplines around all those systems and all those processes may just have the upper hand here.

 

 

There’s a thought exercise that I’ve been thinking about for years now.

 

 

And we may have talked about this when we were having coffee and COVID and if I’m repeating myself, that’s because I’m getting old.

 

 

There’s this thought exercise that I’m attracted to which is if we could we capture all business interactions, all business transactions as stories instead of data? Just imagine that my interaction going to a website and buying something was captured as a narrative. Right? Malcolm landed on Amazon dot com. And and it was this it was this long story instead of this highly atomized modeled reality that loses all context, that loses all intent, that they the the the BI dashboards don’t know how happy I was or how fast it took me to type a search term or not type a search term. It doesn’t know any of those things. It doesn’t know a lot of things about my interaction that you could capture in a story.

 

 

Right? And if we could capture those stories, this is a thought exercise. I’m not suggesting we do it.

 

 

Would that would that change would that change the nature of our businesses? Would that change my how I I treated you as a customer? Or and and I think I I think maybe the answer is yes. And if we did that, would we even need a lot of these systems of record like CRM systems and ERP system?

 

 

This this this starts to get way academic, and and I and I know that is well beyond the scope of anybody listening to this is like, what is that guy talking about? But I I do I do see a fundamental shift where integrating the unstructured data that you’re talking about. Right? All the all the things that Ingrid was talking about.

 

 

Gartner says the same thing. Eighty to ninety percent of unstructured data. Right? Our our our companies are literally running on it.

 

 

If we start focusing on more on that, would it change the nature of how we manage a data function? I think the answer is necessarily that it would that it would, it would need to. That that that’s that’s certainly something to consider.

 

 

Sure.

 

 

I mean, think you could like boil it down to companies becoming unstructured data driven and instead of data driven. Right?

 

 

And basically, we could I’m not saying it would happen, but we could imagine a company being, like, running on unstructured data and not structured data. And I think that, like, none of the transactions that you have mentioned so far are impossible to produce in text today.

 

 

You just need the right input. Yeah. And you don’t write that text yourself that will be generated for you.

 

 

So so yes, I think that is certainly possible, and and that and then, and I this may be a little academic, but, like, if if if we’re to boil it down to what or, like, reduce it to to to some of the things that you said before, Is it the data community that will win the discussion in in terms of what we call things? Will it be called an attribute or a facet? I’m not sure that what was the data community, now the data and AI community, will necessarily win all these discussions.

 

 

It’s not that I I I don’t know. Nobody knows at this point, but there is a possibility that it’s not the data community.

 

 

I I I agree.

 

 

If if you know, I go I go back to my point about the importance of measurement, and I don’t think that that’s ever going away. We need to we will need to continue to measure, and we will have auditors, we will Wall Street and we will we will have generally accepted accounting principles and all the things, you know, that that that require precision. Because that’s what we’ve been we’ve been talking about around unstructured data. I’m seeing this interesting trade off, and it’s always been there. But I’m seeing this interesting trade off evolve between if you’re using an LLM to for insight to make a decision. Right? I want it I want an LLM to tell me something so I can make a decision.

 

 

There is a trade off between precision and cost. You can make that LLM be pretty precise, but the amount of tokens you need to cram down its throat in the process are gonna cost you a lot of money. They’re gonna cost you a lot of compute to get to the precision that you want.

 

 

So the answer to your question or or or to your potential assertion that data may not win, for lack of a better word, I I think it’s a yin yin and a yang. And and I think and I think that data will always be there because we will still need precision at low cost. At reasonably precision at reasonably low cost Yes. Will come through tables, joins, schemas of that world.

 

 

But when we don’t need precision and and when we need creativity potentially or when we are are not are not as concerned with cost, then I think the the AI could win. So I I think maybe it’s it’s probably gonna be it’s probably gonna be both, and these worlds need to come together.

 

 

And I think the CDOs have the are in a unique position to make a land grab. I’ll just use that I’ll just use that phrase, make a land grab. Like, what what I mean, because you know, these people, like, these these these these people with your backgrounds. This is the library scientists, the ontologist, the taxologist, taxonomist. They’re all over the organization. Right?

 

 

Right? Yeah. Like, without one single leader, generally.

 

 

Yeah. Sure. Agreed. They are all over. They will be found in, like, archives or records management departments. They will be found in found in, like, legal departments performing information management.

 

 

They will be found in information security management, perhaps having specialized into cyber security stuff like that, they will be found all over.

 

 

And I think we will always need precision, we will always need structured data. I think in the past, perhaps, we used structured data and precision as a compromise because we simply did not have the technological means to to store data in any other way. Now I think we will in the future, I think we will use precision unstructured data for for where it really counts.

 

 

No pun intended.

 

 

But anyway, maybe this is getting a little like, oh, is all of the goodness getting too academic?

 

 

I mean, it’s I I don’t know.

 

 

I I I should’ve I should’ve known if you and I were given more than fifteen or twenty minutes to talk about something. We if we if if we kind of veer towards these very, very big picture very, very big picture things, academic discussions. But I think, you know, for for CDOs, I think that, you know, what what you what you should be taking away from this is that there’s this whole other world out there about structured data, and you need to go start figuring it out. Right? Now whether you call it a a dimension or asset or a facet or an attribute, doesn’t it doesn’t really matter. But you need to start going and figuring out what unstructured data is, get your arms around it, how to govern it.

 

 

Like that one. Man, getting back to the context thing, I want I wanna I wanna ask you about something that I’m seeing, and and I find it interesting.

 

 

You wrote a book. You wrote literally the book about data catalogs. The so so I should have said this at the beginning of our discussion. Cole is the author of of two O’Reilly books.

 

 

The enterprise data catalog and metadata management. The fundamentals of metadata management, the latter of which was released last year, and the enterprise data catalog was twenty twenty three. Right, Paul?

 

 

Yeah. Exactly. It was twenty twenty three. I’m I’m writing the second edition right now.

 

 

That means that means that people are buying it, which is a second edition is a very, good sign. Yeah. But getting back to the idea of context and the data catalog, if you ask me, like, as a CEO, the first if if I’m if I’m context curious, right, if I’m trying to figure out how to do do context, one of the first things that I would probably suggest is if you don’t already have a data catalog, you’re gonna need one.

 

 

But let’s assume you had a data catalog. Do you have all of your kind of core terms defined in your catalog? To me, that’s a wonderful source of content. Right?

 

 

Is is to get your definitions out of your data catalog and expose them to an agenda workflow. It it could be as simple as as dumping things into markdown where where all of your definitions can be consumed by by an agent. That seems like a fairly digestible thing to do. But it’s interesting, Will.

 

 

My question here is is that I’m increasingly seeing analytics people, the the reporting people using the semantics that are existing in analytics layers, that have always existed in analytics layers. Like, the definitions in, like, what’s what’s, you know, annual revenue? What does that mean? Like, the the definitions that are existing in, like, a Power BI or a clicker or Tableau.

 

 

Whereas dedicating definitions from a catalog, it seems like a lot of people are just deferring to existing semantic layers that are sitting in our analytics layers. You see the same thing? You don’t you’ve seen that?

 

 

And I that’s kind of the reality I describe in my second book, the fundamentals of metadata management book.

 

 

So the first book came out because I basically had something to say about data catalogs.

 

 

I wasn’t to be very honest, I wasn’t very impressed with the majority of vendors in the space.

 

 

I thought they had built, like, from a software point of view, they’ve built their their their catalogs in not the most optimized way or setup, and I wanted to provide an implementation guide that went beyond traditional data management and actually explained what a catalog was about so that it could be functional for the end users.

 

 

And so I also wanted to bridge like my background to library information science and I’ve worked in tech, I wanted to bridge that with with the, you know, computer science, data engineering, these fields that uses data catalogs. In that, I think, is also obviously both a data dictionary and a business glossary that you can put into a data catalog. Now the second book that I wrote was both more complicated, but also more rooted in reality, because what you will find in a company obviously is not, or typically not one data catalog or one BI tool or other data tools, information management tool, you you will find a lot, and you’ll find a lot of them containing dictionaries, explanations, abbreviations, so forth.

 

 

And so that’s the enterprise reality of the of the of the context that we’re discussing here. It does not reside in one single place. It resides in many places, and we need to coordinate that. We need to we need to we need but but first and foremost, we need to acknowledge it. I think many companies, for better or worse, they work in silos, and so I think for CDOs that listen to this podcast, I think one very important part is to acknowledge that this is the reality in companies, and they need to look beyond more than to more than one’s souls in order to to really grasp the context of a company if I may just be so blunt. So for so yes, I mean, yes, I’ve seen that.

 

 

I am not against it. I don’t think it’s an it’s it’s human behavior.

 

 

Well, so you introduced this concept. You called it the Meta Grid. And at least you introduced it to me in in Austin, Texas in January twenty twenty five. That was, I think, was the genesis for you completing your book on the fundamentals of of metadata management.

 

 

And it’s impossible to argue because it’s accurate. Right? Metadata repositories exist everywhere and wherever you’ve got a meta metadata repository slash data catalog, one form or another, you’re you’re gonna have definitions whether they’re explicit or whether they’re implicit.

 

 

You you wherever this metadata is, you will have things there will definitions for it. So I I I think the question is, I I I already know where you stand on this.

 

 

And I I’m open to being wrong about this, but I still think that there’s a role for centralized management of that of those definitions. Or is it do you agree? Disagree?

 

 

No. I ultimately, I agree. I mean, I’m trained in centralizing it, and I am always in favor of that. I think that, like, it’s it’s a it’s a result of many things, but, obviously, every enterprise is a is a living organism. Right? It has many different parts that are all alive, and they are not all as well connected as we would wish. And technology tends to follow the law known as Conway’s law, right, so that it mimics the communication structure of the company.

 

 

And that is why semantically we are we we found ourselves we find ourselves in a in a situation where we have multiple glossaries and multiple definitions.

 

 

I think for a data discussion, there’s very little wiggle room in terms of, like, example, master data management where we have discussed, but we have discussed multiple times, if you operational data to function across your value chain, there is just not a lot of wiggle room. You have to accept shared definitions, master data in order for a for a product to reach a customer. It is simply not up for academic discussion, what an address should be, what a country is, what your product name is, and where where the customer lives, and what the name of the customer is or the number of the customer. Those things just need to work for your products to reach your consumers.

 

 

But above that level, I think many things are up in the air and that is not by intention.

 

 

And it’s just and so but and you’re and thank you for mentioning the the MetaGrid concept.

 

 

I mean, I I have been quiet about the MetaGrid and that is simply because I I have my own idiosyncrasies, and and one of them is that I I I would not like to be a a new data mesh guy. I don’t want to be a person that invented a term that got hyped and then got associated with a type, with a with a term that is no longer hype. So the term really began to take off and and and when that happened, it kind of ticked out of my own hype. I don’t want to be hyped.

 

 

Alright. I can’t think of a better way to end the discussion than I don’t want to be hyped. Alright.

 

 

Thank you, my friend. I I I could I could do this for hours. I know I know we could. It’s a it’s a shame we needed to spend more time in London a few weeks ago when we were at the IRON conference, but maybe we could have another coffee in in London in another month.

 

 

Let’s do that.

 

 

Thank you for having me Thank you.

 

 

It’s it’s it’s been my pleasure. Really, really enjoyed talking with you.

 

 

To our listeners, to our subscribers, check out Gaul’s books. They’re, I I have them both. I’ve read them both. They are extremely insightful and well worth your investment.

 

 

Check out all those work at Acton. You have a podcast with your CTO, Emma McGrath.

 

 

And listen in to your podcast again.

 

 

Enterprise wide search. Enterprise wide search.

 

 

So so check out all his podcast, books. If you’re not following him on LinkedIn, you must follow him on LinkedIn. One of the smartest people of data.

 

 

 

With that, I will leave you my friends. Take a moment to like. Take a moment to subscribe and support the creation of this content. I will keep doing it. I think we’re on episode a hundred and five now, which is just crazy. Thanks to all of our subscribers, our listeners, and I will see you on another episode of the CDO matters podcast very soon. Bye for now.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.
Malcom Hawker - Gartner analyst and co-author of the most recent MQ.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.
Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic

Profisee is a Leader in the 2026 Gartner® Magic Quadrant™ for Master Data Management Solutions