Culture
Data Management
Data Professionals

The CDO Matters Podcast Episode 87

Beyond Data Catalogs: Building Context for AI and Data Products with Kash Mehdi

X

Episode Overview:

Malcolm Hawker and DataGalaxy’s Kash Mehdi dive into the evolution of metadata, data products, and AI governance – exploring why traditional data catalogs are no longer enough. They discuss how context, usability, and human insight are reshaping what it means to manage and trust data in an AI-driven world.

Episode Links & Resources:

Good morning. Good evening. Good afternoon. Good whatever time it is wherever you are in this amazing world of ours. I’m Malcolm Hawker, and I’m your host of the CDO matters podcast. Hey. Thanks for joining us today.

Maybe you’re on Spotify. Maybe you’re watching on YouTube. Maybe it’s not even twenty twenty five. Maybe you’re consuming this sometime in the future, which is kinda cool.

I am thrilled to be joined today by Kash Mehdi, who is the VP of growth at Data Galaxy. Kash and I are data pals.

We’ve met through do you remember where we met, Cash? Was it through doctor Talberg?

 

Maybe?

 

Yeah. I think so. Several conferences running into each other and as well as, doctor Tavern, who I’ve known for several years, more than a decade probably by now.

 

Yeah. So doctor John Talbert, you may or may not have heard of him, runs a CDO centric graduate class out of the University of Arkansas Little Rock, u a l r.

 

Great course, great curriculum. I think you are an alumnus. Yes, Kash?

 

Correct. Okay. Yeah.

 

Well, I so it’s a small world in in in in the kind of in the data world. And however I met you, I’m sure it’s through the conferences or through I I don’t know. But we share a few things in common. We both live in Florida.

 

We’re we’re we’re both evangelist types. We’re out there sharing the good word of data, data management, data governance. And I thought it was only appropriate that we get together and chat. We’ve been talking about recording a podcast for for a long time now.

 

So so I’m I’m I’m grateful you could join. We’re gonna talk about a few different things today. We’re gonna talk a little bit about data catalogs.

 

Data catalogs are are still kind of the new black when it comes to data management technologies out there. I I haven’t seen any of that change over the last couple of years. A lot of people are talking about data catalogs. We’ll we’ll talk a little bit about data products, I think. We’ll talk a little bit about data product management.

 

I don’t know how we have a conversation at a podcast these days without talking a little bit about AI and how AI is going to be supported, I think, by some of the things like data products and data catalogs. But with that, let’s dive into it, Kash. I’m I’m interested to hear your your kind of your high level perspectives on the state of the the the kind of the data catalog union as as it were. Where do you see things right now?

 

What are you hearing out when you’re talking to your clients? Where where where maybe things going? What’s what’s kind of the the the the the big thing around data catalogs these days?

 

Yeah. First of all, Malcolm, it’s great pleasure to talk to you. And catalog is a front and center Top again, the line of work that I do.

 

A lot of the strategic data management leaders that I’m working with, catalog is sort of the centralized source for all of their data. It’s almost like, you know, you go to a new city, you need GPS to be able to navigate your favorite restaurant or go visit a friend. A catalog is essentially like your Google Maps to understand what data exist in the organization.

 

So, today, I think lot of companies are putting in effort to build data catalogs.

 

Also, a lot of the technological landscape is converging. Like, almost every company that you see today or a technology in data management, offers a data catalog. Now, the biggest challenge that I see that’s reported by many chief data officers, strategic data management leaders is the usability, accessibility of these data catalogs. And this is where we’re just beginning to talk about some of the challenges in data governance, where adoption is a huge topic for, you know, data leaders driving change management, but also creating value for data creators and data consumers. Because if you’re creating data, it’s not being used, now it’s going to sit in silos. Now the role of a data catalog is very critical to bring in these insights to people who might be creating reports.

 

You know, building machine learning models to predict the future, all the good stuff.

 

But also, what I’m finding is data catalogs are not the only only things that leaders need to be successful because much of this is now pivoted towards this data management, the data product management thinking where, okay, know what data is out there, but how do we make people to use this data? And this is where I see huge effort going on around building data products, which is which are designed with a target audience in mind for a specific problem or a pain, and then also all the other logistical details around where can I get it, in what format, you know, who is currently using it, what domain it’s under, all the good stuff?

 

So this is where I see catalogs you are evolving towards more of data products where, Of course, leaders need to put in some effort to do the discovery, which is really a lot of product management thinking is about, doing the discovery about what you’re building, but also making sure these are usable, accessible, and they bring real world impact for the organization.

 

Again, I can talk about more things, especially how AI is changing this landscape, but I’m I’m restraining myself not talking about AI or spray AI all over this talk track. But, hopefully, I gave you a bit of perspective in terms of what I’m seeing there.

 

No. That is certainly helpful. And I I want to dive deeper on the data products topic. To me, it it makes complete sense. Right? If if you view catalogs as basically an inventory of everything, maybe maybe a metaphor here would be like the the menu at at at a at a restaurant.

 

It just makes complete sense that that inventory or that menu would include all the things that potentially you are selling or at the very least productizing. Right? You don’t necessarily need to sell or monetize the data here. These can be these products can be for internal consumers, and consumers need a storefront.

 

Right? That’s maybe another metaphor to think about this. Like, kind of a marketplace or a storefront where, hey. This is the place that I go in order to consume products.

 

So that and having catalogs play that role to me makes just a complete perfect sense. So love to to to press on the kind of the data product thing a little bit more, but let’s let’s circle off on a a couple of other things that you had mentioned, a few other trends, and I’m seeing them as well.

 

The one that you had mentioned is a kind of a broader convergence.

 

Five years ago, six years ago, seven years ago, you know, a lot of people were talking about data quality tools.

 

And I don’t hear anybody really talking much at all about freestanding data quality tools because I think there’s a convergence happening there on the data quality front and the data catalog front because if you’re bringing things into this catalog or if you’re bringing metadata into that catalog, it seems to make sense, particularly if you’re following this product strategy, this data product strategy, it seems to make sense that you would want to apply some quality rules to it as it on its way in. Are you seeing the same thing? Are you guys building data quality capabilities in in into your Five four?

 

Yeah, so we see today we’re taking the approach of combining best of breed, but of course, from a industry perspective, we see convergence around metadata in general. I mean, forget about data catalogs in general. Right? So metadata, data quality, and then also data observability.

 

You need all these three to get your data ready for AI or other initiatives.

 

And to support this change, what we’re doing is we’re connecting with best of breed. Of course, as a company, we’re very focused on enabling organizations to find a user friendly environment where they can discover all these data products. So we are more focused on delivering data and AI product governance. So, if you’re imagine, Malcolm, you’re a data engineer, you’re building your machine learning model in Databricks. But to build this model, you need some data.

 

So, we’re meeting you wherever you’re spending time, things like plug ins or even we we’re introducing things like natural language search.

 

Blink is one of the capability we’ve added in our platform where you could ask, like, hey. What data do I have? I’m building a campaign to predict student success rate at a university. So it will list out all of the things that I need to embed in my model. But also, a critical decision factor for not just engineers, like heavily technical users, but also business users. Right?

 

They need quality indicators where can I trust this data? You know, what’s the profiling on this particular dataset?

 

Observability alerts if something goes wrong in real time. I need to be able to understand and be notified.

 

And this is where I think there are a lot of really great tools like, you know, we work closely with Big Eye, Monte Carlo. They’re already doing great work in the space.

 

Company likes Soda as well.

 

So bringing in best of breed, but at the same time, one of the herculean tasks that we have is to retain the user experience for users. Because if you think about any data user, I think it’s true for myself as well. On average, I’m using ten to fifteen applications for multiple things to do my job. So imagine people finding one single source of truth to see what data is out there, what does it mean, where does it come from, if any questions, who to reach out to. So I see a convergence where metadata is very critical, of course, Data observability, data quality, this all brings in a unique experience for the end users. And this is where I think also some of the work you’re doing around master data management, which is also something that we don’t do, but we complement, It’s highly critical because if you don’t know what you need to govern and you don’t have a single version of the truth, there are bigger problems and agility for you to meet your business outcomes, I think.

 

That’s that’s that’s helpful. And and I I appreciate the kind of the best of breed approach because at at Prophecy, that’s what we’re doing as well. We’re not we’re not trying to be a Swiss army knife. We’re not trying to be all these things.

 

All we wanna do is MDM. It sounds what you want to do is metadata management, data catalog, and focus on that, And that’s great. But at the same time, I think, You know, for your average CDO out there, particularly a business centric CDO who may have come from the business side and not the technology side, This landscape is confusing. And, I I I can’t blame a CDO who made me a little bit confused.

 

You know, if I if I turn to, like, a Gartner, and I look at Gartner, it’s like, okay. Well, there used to be this thing called metadata management magic quadrant, and now it’s a data and analytics governance platform quadrant, and there’s data observability, there’s data quality, there’s there’s all of these there’s all of these things.

 

Maybe maybe help well, for one, I’d recommend episode eighty, CDO matters episode eighty with Bar Moses, who’s the CEO of Monte Carlo. Bar does a great job in helping kind of define what observability is. And and the word that I love that kind of sticks for me is is forensics. Helping understand why things happened the way that they did.

 

That’s how Barr kind of describes data observability.

 

You also mentioned kind of profiling and understanding what’s out there in in the landscape. Is that something key that you see as key to metadata management and data catalogs, this this profiling aspect? And and is that something that more and more of your clients are are using your platform for?

 

Yeah. Because traditionally, I think some of the clients that we have seen, they already are used to looking at profiling statistics, like dashboards, measuring completeness, duplication, accuracy, and other things. So, it does give you a different perspective or a different dimension of your data. But observability more real time and monitoring your pipeline. If something breaks, I need to be notified right away, which is very different than data quality. But I see based on the users’ personas that we’re interacting with because today, we serve from data analysts, data scientists to data architects, data engineers. They all come in with different needs.

 

Profiling definitely gives them some insights in terms of like, I think one of the partners that we’re working with is DQ Labs. I don’t know if you heard about DQ Labs. They’ve I think got rated quite high not too long ago.

 

It’s, you know, surfacing profiling statistics for users as well as buying some AI to, you know, drive the automation around data quality. And so that’s one aspect, which is I think it’s bit of more of the end result of the effort business stakeholders are doing. The other big piece is even before you go out there to deploy all of these quality agents or tools, you need to be able to define your data quality standards, business rules in plain English text. And this is something where a CDO probably is working with a quality lead or different business line stakeholders to capture their expectations of data.

 

Because, essentially, the role of a chief data officer is to deliver clean and trusted data to the organization faster. Not just like, hey, I’m gonna go ahead and write this rule for weeks. I’m gonna do approval review, but quality is still bad. Somebody needs to fix this.

 

So this is what I’m finding with working with a lot of leaders, customers on our side.

 

And quality just one of the components as as important as metadata and other aspects.

 

Alright. So you’ve got a platform, a data catalog slash metadata management tool. And and by the way, historically, were largely synonymous, Right? Where you said it’s it’s a metadata management tool.

 

It is a data catalog. Some people even call those governance tools. So just for for those of you who may be new to this space and trying to kind of decipher what’s what, often you will hear maybe one of three names used to describe largely kind of the same thing. Do you agree, Kash, everything I just said?

 

Oh, yeah. Absolutely. In fact, I will tell you this, a bit proactive, is we at Data Galaxy believe data catalogs are dead. I mean, we do data cataloging.

 

Okay. We are more essentially a data and AI product governance. So, again, for the leaders, anybody listening out here, one thing I would tell you is, you know, you’ve done a lot of work building a data catalog. Right?

 

You’ve it has given you a centralized view into your data.

 

This is a very first step towards building a data product because to build a data product, you need to be able to understand what data constitutes in this product for what audience, for what problem. This is just one of the very first steps. And that’s why you see, I think, it it it is true. It is used synonymously by the industry, like catalog, governance, people, You know, put governance everywhere. Just like machine learning automation is now like agentic AI or automation as AI is taken mainstream.

 

But it’s nothing that is under the covers, you you go and see. It’s just machine learning automation, which we have been doing for many years.

 

So in in the data governance space, I think what’s what’s really interesting is there are certain things about governance, and I think we need to carefully define this because sometimes I think there are negative connotations around this where Any new CDO, data leader, you know, a lot of companies, are still trying to get governance right.

 

There’s a huge perception. It’s hard. It’s boring stuff. Nobody wants to play nice sometimes.

 

And leaders are also tired of this because they have been communicating and there’s no change management, no results.

 

So to me, governance is very simple.

 

It’s really about coming in, more of a value function for your organization. So if you’re a CDO, if you’re just coming in and writing policies and standards in silo without active collaboration from your stakeholders, you’re not empowering your leaders, they are the ones that are going to be enforcing these policies and standards.

 

But if you don’t include them, this is more seem like a command and control approach, which typically doesn’t work with humans. You need to democratize the decision making process. But at the same time, make sure you have a operating model in place, like the way you could work and gather their input and then also support them. It’s all I I use this analogy, Malcolm, where imagine you’re driving a car and and I’m sure you have taught other people drive cars in your family, maybe your kids and whatnot.

 

What do you do? You know, you show them all of the rules, but when they’re in the driver’s seat, now they’re in real danger because they if they don’t follow rules, if they don’t follow speed limits, they get a ticket, all these things.

 

And that’s exactly the role of a CDO. It’s almost like teaching somebody how to drive is what a CDO is doing with data. Actually, teaching you how to utilize this data more effectively.

 

Sorry. I’m I’m going on a tangent here. I feel very passionate when you said data catalogs and governance. Oh my gosh. I’ve I’ve come across this so many times.

 

There’s so many anecdotes to put out there. But anyway, I digress.

 

No. You well, so when you work at data long enough, you get you start to get really good at metaphors.

 

Because you try you have to try to explain what you do to nontechnical people. It’s like and I’ve been do I’ve my my parents are in their eighties, and I’m still trying to find the right metaphor. It’s like, you know, previously when I was in in the product side of the house, I was like, well, if you imagine if we’re building cars, I’d be the one the person that would define whether the car had two doors or four doors. It’s oh, you’re building cars.

 

No. It’s I’m not building cars.

 

So, yes, love love the metaphor.

 

To tie off on the whole kind of governance issue, here here’s some more advice. Again, if you’re maybe perhaps a nontechnical data leader or you’re new to the space or you’re you’re listening to the podcast because you’re trying to learn about data and analytics management, there is no such thing as a single data governance platform. It doesn’t exist, and it will never exist. Governance if you look at a standard governance framework, there’s a lot of things we need to do.

 

There’s security and access. Yes. There’s ethics. There’s MDM. There’s reference data management. There’s data quality.

 

There’s data catalog, metadata management.

 

All of these things, including enforcement of policies, which arguably should happen within operational systems like CRM systems and ERP systems. So if anybody tries to sell you a single data governance tool to rule them all, run for the hills because it doesn’t exist. Just words of the wise, single data governance tool does not exist. There’s multiple tools that will be involved in helping enable any enterprise class data governance program. There’s no one single tool, but a robust metadata management slash data catalog tool is certainly, I would argue, foundational because you gotta know what is out there.

 

Let me ask you a couple of other questions in terms of kind of where things where I see things evolving here.

 

And one of them is around the idea of kind of information management and knowledge management. Like, getting beyond which would largely, I would argue, let’s let’s take the world of knowledge management, Starts to get into the world of unstructured Data.

 

Right? Large blocks of text, SharePoint servers, PDFs floating around on on marketing shared drives, even recordings of customer service calls, like the list of potentially unstructured data is very, long.

 

That lives and breathes often in a world called knowledge management, which historically has been kind of separate from data management.

 

Where do you see these pieces fitting together, and and what are you doing to try to potentially stitch and and weave this this world of structured data and unstructured data together?

 

Yeah. So I this is all about context for us. So, you know, unstructured data itself has there’s huge potential in terms of helping organizations govern unstructured data because there’s there are a lot of insights. I think that what I see where we we are evolving, especially with our product as well as where the industry is going, we have the technology today. The AI you know, I I was actually, Malcolm, I was talking to head of engineering at a security company, like, they build cameras and all that, and they have many IoT devices that send data structured, unstructured. And as you can imagine, this line of business, you have a lot of video footage and all the different things. Oh, yeah.

 

One of the things that they were talking about was LookCache, and and they’re existing DataGalaxy user. I cannot expose the name.

 

Apologize for that. Your security guard.

 

I’m I’m sure everybody understands that. Now in terms of unstructured data, this leader told me like, AI is really good at classifying data for us, tagging data so we know exactly what they know, how we can structure our unstructured data, like by tagging classification and other things.

 

And this is where what a lot of users require in their day to day is context. And data catalogs, data governance platforms, other technologies have a great potential to expose maybe a holistic list of, like, if I have a product around customer security and other things, data products even can include an output port, and this is how we actually are positioning this in our product, where you define the product with all of the critical metadata, you know, where it exists, where you could who’s the owner, But we also have this concept of output port where you could actually go in and gather the specific structured or unstructured data. And this is where you can apply, like, I think one of the most commonly used technologies with unstructured data is OCR.

 

Yeah. You know, where you could do identify, translate image into an actual text and things like this. So what’s most important for us from a data governance, data product standpoint is providing the accessibility. Because a lot of times, people don’t know what exists where.

 

Just going back to my GPS example, like, you need to go somewhere, you need to be able to use Google or whatever, perplexity maybe nowadays to find where you’re going. Right? So and this is the key role that we are playing with our approach.

 

Got it.

 

Super helpful.

 

I see the evolution of unstructured data and the need to govern that data at scale being absolutely critical, and catalogs will certainly play metadata management solutions will certainly play a role here because you gotta know what you’ve got. Right? And it’s interesting use the use of AI for tagging all of that content. To me, that’s the the joke the joke kind of pithy phrase that I’ve used to describe that is like that to me, that’s like the cover charge to to get into that you have to pay in order to to start to apply data management processes, traditional data management, data governance processes to that data, you’ve got to tag it. Right? Because that’s like, to me, that’s step one of understanding trying to understand kind of what’s out there.

 

But there are broader contexts out there in in terms of, like, just looking at large swaths of text. What is what is this about? What does it what does it refer to? Is this describing a customer or a process or a location? What is it? So you can start to kind of apply some of our traditional data management processes there.

 

I do see a bit of a paradox in that. I think we’re gonna need AI to tag all of that stuff at scale. So we’ll need AI to help us govern data for consumption by AI. So it’s it it it is a little bit of a fractal. It’s a, you know, holding a mirror in front of itself because if we’re I think we’re gonna need AI to do this at scale.

 

And and and just to interject there, I think another point that I I would say here where governance is a value add is for any leader, strategic data management leader is to define policies and standards where, let’s say, you have some unstructured data sitting around somewhere and your regulation tells you it cannot be retained more than seven years, Somebody’s gotta go out there and investigate. But how are you gonna find this? Like, okay. I’ve got this data product that really helps me try to enforce this particular policy. And this is, like, really one of the common use cases that I’m seeing, like, where organizations, whether it’s regulation or from a cost perspective, because storage also incurs costs. Companies are taking those active steps to deal with structured, unstructured data altogether.

 

Not the costs are very real.

 

Not to mention the fact that, you know, we’ve got a whole bunch of what Gartner defines as dark data sitting in data centers consuming scarce energy. So there the estimates here are all over the map, but anywhere from fifty to ninety percent of data sitting in data centers is largely dark, like unused, just collecting digital dust.

 

And and I wrote an article a couple of years ago. It’s actually been three years ago now. I think it was in twenty twenty two. I wrote an article for Forbes talking about the data hoarding problem we’ve got and the energy implications there and the environmental implications there because the data center industry consumes a ton of of energy.

 

So not only is it is it a good practice to catalog all of this data and understand what’s out there, to your point, archiving it. Like, do we even need to be hoarding this stuff forever? Sometimes regulations will say, yes, you do. But other times, if you got data that is badly out of date, not used, not being used in a report, not helping you guide any sort of AI driven workflows, archive it.

 

I mean but but again, just even in in taking that step, you’ve got to know what you’ve got in order to make an informed decision about whether it should be lingering or not.

 

Speaking of Mennonite and Benjamin, here’s a little plug for for my buddy. Oh, that’s weird. It just blanks me all out when I hold the book up too high. I gotta lean back.

 

Anyway, a plug for my buddy, Olsen Bagnow. Ola wrote a great book, Fundamentals and Metadata Management. You should check it out. In that, he actually calls out the the growing number of kind of metadata management silos out there and calls out the need to kind of find a way to integrate various sources of of metadata across the organization, including knowledge management and information management. Anyway, let’s talk more about data products.

 

Hey, Kesh. What’s a data product?

 

Well, data product, not one definition. But from my perspective, a data product is essentially, asset of your organization that has a purpose. It does a job. It’s almost like if you hammer, the use of a hammer is to hammer nails, right? A data product is designed with a purpose.

 

It has a clearly defined ownership. It has a target audience. It has a format, you know, in which it exists, And it also has channels where users can consume this information.

 

But essentially, a data product is designed for a specific business outcome that you want to deliver for your organization.

 

Data product is not just a dashboard or a report or just documentation like, hey, this is what the data is. It specifically comes with guidelines in terms of how and where you can use it, how and where you could get it, and for what purpose. And that’s how I typically look at data products.

 

I love that. I love the design to solve a specific business problem.

 

That I could not agree more. Yes. Yes. Yes.

 

If it’s not solving a specific a business being the keyword there, I could not agree more. Because if you’re not, then then what’s the point?

 

What I’ve learned is is that, you know, there’s a lot of people very passionate about data products and there’s a lot of different definitions out there. And I think I think they can all be right at the same time, and I know that’s a really wishy washy maybe perspective. But, you know, largely, there’s this kind of this shift left view of the world, which is this kind of lowest level of atomicity, which was kind of born out of the data mesh movement, I I I would argue. And then there’s people like me who are more of a shift right, which is these are finished products. These are things that are maybe aligned to what a a retailer would call a unit of sale.

 

Like, some something that is solving a specific problem that you could actually sell and maybe even generate money from. Whether that is to an internal customer where they’re allocating budget to you every year or maybe even an external customer where you’re monetizing it. And I don’t necessarily think I I’ve evolved. I don’t think there’s a wrong answer here. I think I think products can exist on a spectrum.

 

Particularly in a world where some of your customers may be robots.

 

So what do you think about that, Kash? What do you think about this whole world of of agentic AI and using agents to go out there and and solve problems. And maybe those problems are even, Like, build me a dashboard. Are you are you seeing some of that?

 

Where do you where do you see things evolving there around AgenTik AI?

 

We’re at this inflection point with AgenTik Actually, was talking to Peter Aiken. Do you know Peter Aiken? Yeah. Yeah.

 

The president of DAMA.

 

Yeah.

 

So I told him, hey, what do you think about a gym ticket? And I’m gonna give you my answer, but this stuck with me. It was like, Cash, it’s there’s nothing as such called Adjunct AI. It’s almost like calling a librarian a book librarian. It’s like redundant.

 

It’s just like yeah. It’s like in Chinese food in China is just food.

 

Exactly. Yeah. There you go. Who knew? So, terms of, I think, what’s happening in the agentic AI or the agents and whatnot in this world, they all need context, business context.

 

I think one of the most critical things that you need with AI is the business context, the business meaning. And a lot of this is Guess what? Where where does it exist today?

 

Do you have an answer? Do do you know where where it is? Context?

 

Yeah. Context.

 

Where where does context exist today? Well so, I mean, I work for an MDM company. I could tell you context exists in MDM, exists in data catalogs, exists in all that unstructured data. Depends on where you’re looking, but what what do you think is the right answer?

 

Human brain. This is where it exists.

 

There are a lot of people that are out there. You know, they’ve spent years in a given company. They have the all of this content. I think technologies do a really great job in terms of extracting this, providing, like, this is where user experience of a data governance platform or any technology of sorts in data management.

 

User experience is one of the most critical things.

 

And I think there should be really a lot more science and research in this area because this does great wonders for how AI is gonna be successful. For a lot of these agents, they need this context. And and this is also a shift that I have been following as as you’ve seen, of Informatic announced by Salesforce, or a lot of data catalog companies that were previously reported on Gartner and Q. I’m curious to see this year’s or next year’s MQ, how many of those players still exist?

 

I already see this when I go to a conference. It’s like, hey. Wait a second. I used to see a lot more data catalog vendors.

 

Some of them have been acquired because all these companies, like, even in ServiceNow or Salesforce, like, to drive all of this automation, they need context. And metadata MDM, data quality, observability plays a huge role in driving that automation.

 

So an agent is just nothing but just some, you know, code sitting out there without this context.

 

It’s almost like, you know, we say we have a lot of data silos.

 

I think we will have agent silos if we don’t have if we don’t provide them good context for them to automate work, business activities, creation of reports, dashboards, and surfacing that insight back to the users.

 

Totally agree. But here’s the problem or the opportunity for those opt optimists among us. The the opportunity here is the intersection of a row and a column is context less.

 

There’s there’s no there’s no context in the intersection of a row and a column. Right? If you just just any relational database okay. It’s let’s just say it’s a customer table. Okay. You know it’s a table full of customers, and that’s it.

 

Right? You don’t you don’t know if the customer’s happy. You don’t know if they’re sad. You don’t necessarily you don’t know if they bought something recently or previously. You’re just gonna know, okay, first name and maybe last name, and that’s it. Customer. You you don’t know anything else about the customer, the state of their relationship with a given company.

 

So how do you get that context. Right? How do you actually add in all of that additional things? This starts to get into conversations around, like, knowledge graphs, vector databases, other other tools at our disposal.

 

But how would you answer that question? Right? If if most of our data according to Gartner, eighty to ninety percent of our data in our enterprises is largely unstructured. We can have a largely meaningless argument about what it what does it mean to be unstructured, but, like, not rows and columns.

 

Right? If there’s eighty to ninety percent of the the insights are sitting out there outside of relational databases and we agree that context is so important.

 

Right? It’s it’s it’s it’s absolutely critical.

 

How do we how do we apply that context against our legacy world that’s probably ill suited to do it.

 

Yeah. Well, unfortunately, I think the the opportunity for many organizations is to really do the mundane boring work of bring documenting that context. Of course, AI can help you classify. It could tell you, hey, Kash.

 

I found all of this data that’s looking sensitive, PII, and all of that. But can you write that context? Because essentially, it’s not just AI or data driving a company, it’s people that are in the leadership structure. Because if you look at ESG regulation and other things, like, it places great importance on the governance structure.

 

The reason being, people essentially are the driving force behind making those business decisions, And they’re the ones that have all these unique ideas. Today, I think their training enablement functions within any given company, are training employees on literacy, hey, this is how you should be using data and all that.

 

But the bottom line is we really need to capture the context, like, it’s little documentation, which of course, you know, AI is really good at providing you definitions on like, hey, this data, this could be a really good definition.

 

But again, that’s not good enough to me, at least from my perspective.

 

Even when I’m writing an email, if I’m asking AI to help me write it, I still wanna make sure I’m using my originality, like the tone of voice, assertiveness, whatever that may be because this is what essentially is happening at a bigger scale for a corporation, organization, where people driving the business need to make sure they capture this.

 

That’s when I think they’re they’re going to utilize, get more benefit, more buck out of their data itself.

 

And you’re right. The I I think the table column example It’s just metadata, like customer Yeah. For ID, customer underscore status, whatever. It just tells me, okay, I’ve got a bunch of customers with a particular status.

 

But if somebody is saying like, hey, I want to understand customers that started with my company so and so date, and they canceled my service in less than two months.

 

Okay. Then now we’re talking. This is the real context. This is a real business problem.

 

And this comes from the business side of things. And this is where I think that the if you remember the debate around who really owns data governance, I think a lot of people, when you started talking about data governance was like, okay, is it an IT owned, business owned? Is it in the middle? Oh my gosh. I think it’s just very simple where the success of your data governance initiative is really depends on the diversity of people, stakeholders.

 

I’m a big believer, like, at probability of success increases the more people get involved.

 

Of course, you don’t want to have too many chefs in the kitchen, but of course, more people to provide perspective because at the end of the day, governance as a whole, it’s impacting these people, and they need to know the change, the impact that’s coming.

 

And they are also the ones that need to make sure there’s context that’s mutually understood.

 

And that’s the work I think requires some people management skills. I know it’s hard, and this is where I think when I talk to a lot of Chief Data Officers, even like we’re running this Chief Data Officer masterclass program.

 

We have ran ten plus seasons in the last, two years, three years, since twenty twenty two.

 

Common theme, Malcolm, is, oh, I want to communicate. I want to drive change management. It’s really hard. I’m getting a tough time getting my business case approved.

 

And and there, I think the opportunity is for leaders, their toughest job is to connect business challenges with business outcomes. Because let’s be honest, like if you have a data quality issue and you go around and tell people, okay, we’ve got a data quality problem. But you’re not telling what is the business outcome or a goal that is being impacted? Nobody cares, honestly. Like, okay, quality problem, like, why should I care?

 

I don’t see myself contributing to this bigger goal, so I’m not gonna make any change. But but anyway, I I think these are common things that I’m seeing. And leaders find it really hard, but they have to start somewhere And they have to lead with data storytelling, communication, documenting these quick wins. I even advise CEOs, believe it or not, hire a marketing person so they can market your quick wins to the rest of the organization, which in a way is data literacy enablement and all of that good stuff.

 

With such a awesome last mini rant there, could not agree more. A lot of kernels of wisdom in the bits about governance.

 

Ownership, this idea of ownership to me, I I drives me nuts.

 

Because when you peel the, and and I know we we we need to wrap up here, but when you peel that orange, the kind of the concept of ownership as most people embrace it today in the world of data and analytics is completely useless. Why?

 

Because when we think about data, we can think about it across multiple vectors. There is the creator of the data. There is the benefactor of the data. There is maybe even the sovereign of the data, like me.

 

My data sitting in data some bay some some place else. There there there is the user of the data. We also can acknowledge that there’s analytical uses of data and there’s operational uses of data. This is the basically, like, a a a multidimensional cube where the idea of one owner of all of that is simply preposterous.

 

This is a very complex racy matrix where we all need to be focused on driving business outcomes. If we take that as the lens to your point, if we take that as the lens, it’ll work backwards from there. What regulations are do we need to adhere to? That’s kind of like the broader framework of everything.

 

We need to comply. Right? So that’s kind of like the speed limit if we want to pick our if pick our metaphor here. But then then assigning responsibilities.

 

Okay. Who creates the data? Who who defines policies for this data? Who enforces policies for this data?

 

Who ensures that it’s going to be accurate for these various use cases? That’s what the the totality of that is ownership of data. But one person, particularly for data that is shared widely, No. It’s a bit of it’s a bit of misnomer.

 

So I I appreciated the rant there.

 

Yeah. One thing to and this is where I think the concept of data product is it’s helping because you essentially have a data product owner for that, you know, confined product for whatever purpose you’re designing, There’s a whole idea of like the data mesh, data fabric, other architectures, it focuses on this concept of a data product. I tell leaders that you don’t have to be married to a specific architecture mesh fabric. You could still implement a data product without Yep. You know, these architectures. But I think what I really like about a common theme across these architectures is it provides this centralized decentralized approach where teams that are intimately familiar with data gets to define these products.

 

Of course, Like, any government operates in any given country, like you have a federal government, state government, of course, have those standards. So everybody is following when they put a product on the shelf. It comes with some quality checks. That’s all.

 

Again, I’m I’m sounding making it super simple, but, you know, we don’t have to complicate this challenge or opportunity around data products.

 

Oh, we love to complicate things. So my my concern with both fabric and mesh and by the way, I’m a huge fabric believer, less of a mesh believer for separate reasons. Look to previous episodes for some of my my thoughts on that. But my biggest concern is that there are architectures that are offering driving operating models, and it should be the other way around.

 

The operating model should drive the architecture. One comes the one comes before the other, and the operating model of your data and analytics function should certainly reflect your strategy. But to me, that’s the cascade. Strategy, operating model, tactics, and architecture.

 

Architecture. And if you are picking an architecture that that necessarily forces you into a specific operating model, which is exactly what the mesh did, I would argue. You pick it from an architecture perspective, say, hey. Love the architecture.

 

This is great, flexible, scalable, extent, blah blah blah blah. It forces you into a specific operating model that may not necessarily be the right one for your organization. So as a CDO, start at the top, don’t have your architecture drive your operating model, that’s when things start to break. Anyway, we could we could talk like this for hours.

 

We we we should allocate it more time. We had technical difficulties at the beginning of our conversation, so I dragged my feet on this. We’re already a little bit overtime. Cash, thanks so much for joining CDO matters today.

 

Thanks for sharing your insights with our community.

 

Tell me you’ll come back sometime in the future and we can continue.

 

Yep. Absolutely. And I’m looking forward to having you at the next season of our CD master class. And I’d love for you to address this particular topic around, you know, why you’re leaning towards data fabric versus data mesh. Again, this will take an entire episode to talk about things.

 

Yeah. Malcolm, pleasure to be here. Thank you for putting it all these great topics out there for all of your listeners to enjoy and learn these concepts, and they can be aware of the risk as well as the opportunity. So, I certainly enjoyed, talking to you and look forward to staying in touch. Thanks so much.

 

Awesome. Thanks, Kash, for being here. And if you’ve stayed this long for this episode, hey. Give a like and a subscribe.

 

We’d love for you to join this growing community of data professionals. Yes. We are CDO Matters, but we’re not just for CDOs. We’re for people who want to become chief data officers.

 

So we’re here to learn. We’re here we’re here to share what we know, and we’d be thrilled if you joined this growing community. With that, thanks for tuning in to another episode of CDO matters. We will talk to you again sometime very soon.

 

Bye for now.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.
Malcom Hawker - Gartner analyst and co-author of the most recent MQ.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.
Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic