Episode Overview:
Episode Links & Resources:
Good morning. Good afternoon. Good evening. Good whatever time it is. Wherever you are in this amazing wonderful planet of ours.
I am Malcolm Hawker. I’m the host of the CDO matters podcast. Thanks for tuning in. Thanks for downloading.
Thanks for listening. Thanks for streaming. However you’re consuming the content, I’m absolutely thrilled that you’re doing it. Today, we are joined by Sarah Levy, who is the CEO and founder of Yuno.
We’re gonna talk a little bit more about Yuno. We’re gonna talk about unstructured data. We’re going to talk about context. We’re going to talk about metadata.
We’re going to talk about what CDOs need to do differently in order to get their hands and arms and legs and minds around a AI. So I’m excited for our conversation. If you are a CDO or senior data leader or just a practitioner in the data and analytics space and you need to move faster on AI, well, today is the day for the conversation. Sarah, thank you so much for joining us.
It’s nice to see you.
It’s nice to see you too. Thanks for inviting me, Duncan. Great to be here.
Well, we’re gonna have a good a good topic today. This is something that is near and dear to my heart. Right? How how do CDOs close the gap between what they’ve always been doing and what they need to be doing differently in a world of AI? Before we dive into that, I do want to hear a little bit about, you know, I want to hear a little bit about what you’re working on and your background. Tell us what your passion is and what you’re working on right now.
Okay. Sure. So it’s Sarah. I’m calling from Tel Aviv. So I’m based in Israel regularly, just flying a lot everywhere in the world.
And I spent the past twenty years leading startups in various industries, so over a decade in cybersecurity and then eight years in health care and then some time in fintech, everything everything was with great focus on data and AI. Back then, it had the same the name machine learning, but it kind of meant the same. Right?
And I think I’ve experienced a lot of the problems I’m dealing with back then. So and and when I started Uni, I started Uni about two and a half years ago with a cofounder from California. We decided to try and unlock one of the biggest problems we always experience, and it’s in a big organization, in a scaling organization. How to navigate this clutter, this complexity, this misalignments, these silos that you have in your data?
It doesn’t matter if you’re trying to build an algorithm or build an analysis or a data scientist report. I mean, that’s that’s always what you encounter, and it has its different, you know, version depending on your specific practice. So we started, you know, And to to make it as simple as possible, I think, and we’re going to talk a lot about AI and metadata context, abstracture data, but, you know, every organization that’s trying to build AI capabilities today is facing the extreme efficiency that it brings into your practices. I mean, you just move faster.
It’s developers that use Copilot and tools like Cursor, and it’s business users that connect it to the email, to the Slack, to the Jira, to the Notion, to the business application. They do everything from their agent. It’s connected to all the tools, but then they just need their data there as well. Right?
They want to ask data questions as part of their work, and the data team would say, no. No. Don’t connect the agent to the data. It’s just not gonna work.
It’s gonna be completely misleading, hallucinating, and you will completely lose trust in our data. We need at least two years of work before we let you do that.
So that’s where UNO comes in.
The executive suite wants AI, so data teams needs to do huge amounts of work to kind of feel confident, and we’re closing that gap. We’re generating automated context for AI based on what everything you already have in your data stack structure, unstructured data, and so on. We’ve touched on all of that, but that’s what we’re here to do. We deploy with many enterprises right now.
Think we’re introducing a new modern approach tailored for AI compared with what I consider traditional data catalog, semantic layers, and I would say more legacy metadata platforms.
Okay. So I’m a tech nerd at heart, and the first question I want to ask is how?
Yeah. Of course. So And it sounds like magic.
Yeah. Well, of course. Right? And and as far as sales pitches go, I mean, I like it. I’m intrigued. Okay. Automated context for AI, and and we we know that AI, particularly Gen AI needs context.
Right? So is this knowledge graph based? Is is are are graphs kinda sitting underneath a lot of what you’re doing, or are is more vector based technologies? What’s is it more search? What’s what’s happened? A little bit of all of that? What what’s happening under the covers?
It’s it’s it seems like the word graph has to be there. Right? So, yes, it’s based on on graph database technology. But let’s let’s just double click a little bit into the problem.
Mean, when we AI doesn’t work for the data and we need context, it’s because everything is so cluttered and complex and misaligned. When you look for an answer for what’s the ARR, you might find certain different definitions of query, different tables, different calculations. Some of this data is completely, you know, not relevant, low quality data. Some is highly governed and well modeled, but the computation is something that someone created in a notebook and it’s not even relevant.
So the answer to how it’s calculated, who created it, who is using it right now, where is it being used, what type of data is it querying, is data is data fresh, healthy?
This is all context Yep.
And captured by metadata. And if you’re able to leverage this metadata, and by leverage I mean preprocess it in a way that you can surface in real time insights about this metadata to the agent. And I think the magic is there, real time insights, not just collect the metadata and have it sit there. No one can do anything with that.
That’s context, and that’s what UNO does. With automation at its core, we map metadata automatically from your entire stack, everything that already exists.
We stitch it. We connect all the dots. We stitch code level lineage with usage, with ownership, with logic, with code, with documents that were created in systems like Confluence and so on.
When with this powerful foundation that is managing a graph database, we’re able to automatically classify metadata based on rules organizations set. What’s PI free? What’s certified? What’s AI ready and what’s not AI ready?
Just based on the rules. And with that foundation, we connect the system to whatever agent you pick, Cursor, Cloud, Databrick Genie, Cortex Analyst, Gemini, Glean, you name it. I’m just naming all the integrations for the past two, three weeks. We can surface real time context in real time.
You’re looking for churn. I’m going to find you the two most relevant churn definitions that are certified, that are reliable, with the logic, and will help you generate generate your agent generate the right query. That’s how we do that.
Okay. I’m I’m loving it. But, again, I’m a I’m a geek, so I I wanna keep peeling the onion here. Historically, metadata management solutions, right, have relied on aggregating data, consolidating data into a single hub, a metadata hub.
Are you doing something similar here? That’s question number one. And then question number two is when you surface these insights to to these LLMs, how are you actually doing that? Are you relying on some some idea of one of the many metadata standards out there?
How is the data actually being presented back to LLMs in a way that they can easily consume?
Okay. So let’s starting with the first question, we are targeting every system that contains metadata. Yep. It can be the quality system, it can be the warehouse, all the BI tools, DBT or any transformation tool.
We’re collecting metadata from everywhere, including the human context. And we are kind of, you know, centralizing it, unifying it in one place. But this one place is not one database. It’s actually multi I mean, in the back end, it’s multiple databases.
One of them is the graph databases. Others are not because you want to be cost effective and you want to be performance effective. There is an interesting architecture there. If you store everything in a graph database, it’s going to be extremely expensive.
Wouldn’t justify that investment. Yes, there is intelligence put into that. It’s not just that. When I talk about lineage and metadata and putting everything in one place, there are many sources for lineage.
You can get lineage from DBT. You can lineage from BI tools, lineage from the pipelines, lineage from Snowflake. What’s the correct lineage? And it brings me to a key differentiator compared with those traditional metadata platforms.
This decision process is fully automated, with global processors that can get various sources of lineage, including code, including SQL, and can determine in every single case what’s the accurate lineage. You don’t need I’m highlighting that because, you know, it’s almost a commodity today to have lineage, but I haven’t encountered a single system that is able to automatically generate column level lineage end to end, as you know, does so fast and then stays up to date with the dynamics.
That’s one key differentiator, this automation.
Then going back to what do we surface to agents and what kind of information and what can we classify. First, we’re able to surface insights fast, instantly. It means we need to be able to query metadata with very complex queries. Show me all the DBT models that feed these dashboards that belong to the gold schema and that were not well documented.
That’s something I need to be able to pull instantly. I cannot wait ten minutes for that. Right? One core capability is to be able to ask complex questions, your metadata, and get the answers instantly.
And the other, and that’s related to this classification and insights, you can really set rules that can be that can include analyzing code, analyzing descriptions, embedding and indexing all the names and tags and metadata pieces, analyzing dependencies and relationships, and based on these rules, classify metadata. A very simple example that seems naive but not solved.
You know where you have critical data in your warehouse. You know, these columns are critical. You can even ingest that from an Excel spreadsheet or from a DSPM security system.
But you don’t know which or if your users that use Power BI reports actually have access to sensitive data after this sensitive data went through the shared connect account, through all the transformations in Power BI, all the way to the reports and workspaces.
What users which users? This is managed manually by data governance people. It’s crazy. So the CISO has these amazing automated systems for security, but data governance stops at the warehouse level.
That’s an interesting use case. Now what about the AI agents offered by Power BI to give access to every asset that’s built in Power BI? What about the users of this agent?
They access sensitive data from there, not directly from the warehouse. We’re able to classify automatically everything that’s sensitive by analyzing masking and lineage on the call level and alert or stop information from flowing to non to people that don’t have permissions.
One simple example to how automated classification integrated with lineage, integrated with AI brings a lot of value.
So that’s an interesting problem to solve, what you just highlighted there. And something you just said is is is very compelling and something I don’t think we as data people think enough about.
The phrase you said was governance ends at the warehouse, which is spot on.
And what happens if you are exposing data that is going to be consumed by agents that is being used operationally?
Right? And how do you know who created this data? Can you assign the access rights to the creator, to the agent who is then consuming the data from the original creator? That that problem is interesting. I’ve actually talked about that in a in a number of different podcasts over the last year and a half Because so many of our data people, you know, see governance as this issue of, well, it’s just about the warehouse and making sure that people have access to the reports. Well, if this data is being used multiple places and if it’s being used operationally outside the warehouse, you’ve got an access and you’ve got a security issue you need to try to solve.
That aside, one one one thing I wanna continue to press on here and by the way, I this isn’t I I I didn’t design this as as a sales pitch, and and I don’t think it is coming across as a sales pitch because frankly, everything you’ve been saying, Sarah, we need to figure out.
Yeah. Right? Yeah.
Like, everything we’ve just been talking about, yes, we’ve been talking about UNO, but but everything that you just were saying and sharing, we need to figure all of this out. Right?
And and about UNO was created. I think the first six months of this startup, we didn’t write a line of code. We just spoke to data and AI leaders. Yeah. Everything we created was to solve problems we experienced and heard from other data leaders.
Yeah. Well, so one one problem that is near and dear to my heart and that we I’d I’d love to spend a little bit more time talking about is is this issue of semantics.
And I’m not talking about necessarily a semantic layer per se, although we can certainly talk about that. I’m talking about differences of semantics across metadata. Right? Because the in the environment you just described, which is metadata is everywhere.
Right? It’s sitting in ServiceNow. It’s sitting in Salesforce. It’s sitting in DBT. It’s sitting all over the place where you’ve got multiple competing definitions for the same things.
And multiple compete how how are you dealing with that? Are are you you’ve mentioned now multiple times you’ve got rules.
It are there is there, like, a rules based engine here to establish some commonalities and consistencies around the definitions of things?
Yes. I think it’s an excellent question because I think the first big mistake or naive mistake when trying to deploy AI is to assume you are going to establish one source of truth, one semantic layer to win them all, and it’s never gonna happen. It’s just the modern version of oh, sorry.
Sorry. I’m an MDM I’m an MDM guy, so that that’s like you know, I’m I I it’s okay. I I completely agree with you. There are many versions of truth. The the fact that the reality of our world is is that the way that finance looks at the world is different than the way that marketing looks at the world. It’s a reality.
But it’s not even that. I mean, you could say, okay. I’m going to build this center of, you know, multiple versions of truth. It’s just going to be centrally governed.
It’s just that this takes time, and the business always moves faster. Yep. And new things are created. The time it will take you to accomplish that, you will be lagging.
You will always be lagging. So I’m not against a central governed set of metrics, definitions, semantics, whatever. That’s good practice, especially for central KPIs, for specific domains that are highly regulated. It’s always a good practice to have that, but it can never be the only thing you rely on.
Now, on top of that, you have semantics created everywhere in BI tools. So for Tableau, you have, you know, calculated fields, polls. For Power BI, you have metrics. You have docs where you can build logic.
For Looker, have LookML. I mean, everywhere. You will always have. And every news new tool, Hex, Sigma, Omni, they all have their own semantic layers.
And now Snowflake has semantic views. Databricks has their metrics, and and DBT has DBT metric. Everywhere you have ways, standards, even though they’re unified, it doesn’t really matter because they live everywhere, to create semantics. And I’ll quote a customer of mine that said, the semantics are there in the ether.
You just need to find right? They’re there. They’re there. Somewhere there. I mean, you don’t need to stop, refactor, migrate, rebuild, everything.
They’re there. You just need to find them. Now I’ll give you some examples of rules and workflows how our agents work. You can tell your agent you always start from the governed semantic layer.
This always wins. You start there. So let’s take Snowflake, for example. You start and look for semantic views.
If you find a semantic view that meets the search criteria, this always wins. This always gets a priority. So you can work like that. The other step is when I’m looking now for the right definition for churn, some will be like churn, customer churn, abandoned users, leaving users.
There are many semantically equivalent definitions, plural, you know, singular, synonyms, all of that. We embed and index all those titles and descriptions. When we search for churn, we will find everything that is similar semantically to search everywhere, in descriptions, in unstructured descriptions, in tables, in definitions, in metrics, everywhere. Then we start following those rules, so Semantic Layer can win them all.
Now certified.
Things that are not built, and this really is different between organizations, but we’ve seen things that repeat themselves.
Things that are not built on documented models in DBT are not certified. Why? Because the will not have context because it was never governed in the upstream side of it. If it’s squaring raw data and not, you know, bronze or silver or gold or just gold, it’s not certified.
It’s not AI ready. It might be certified for other things. It’s not AI ready. If it contains SQL logic that was built in Tableau, it’s not AI ready.
Then you don’t expose your agents to ninety percent of the things that are experiments, ad hoc stuff, that are there. They’re very there. It’s very hard to clean them.
At the same time, you can run workflows to automatically declutter things that were not used. Usage is a huge business value indicator. You might find twenty definitions of churn, only two of them are actually used in dashboards that someone viewed in the past three months.
So we use these hints, like usage, and this and and lineage, like what is it querying, healthy tables, trustworthy tables, and these rules, like what’s following the standards, what’s following the documentation rules, to narrow the results of the search significantly.
But then you might find yourself with three legit churn definitions. Right? Yeah. That’s fine.
And now we have all the information to send it back to the agent with, okay, we found one in the product domain, one in the finance domain. The product domain churn looks at active users, The finance domain churn looks at paying customers.
That’s the definition. Do you want to ask more questions? That’s fine. We can give you more context.
But which one, which churn are you interested in? Then the business user can actually build trust in what it’s going to get Because it’s not just getting it, it it can even ask follow-up questions. That’s that’s the explainability part, and it’s essential. A number, whether it’s accurate or not, is is useless if you’re not able to build trust in it.
So we give the agent the context to get business context, and then we can give it the logic to generate the right query.
I find that interesting for a lot of different reasons. The first one being that what you described is something that I talked about when I was an analyst at Gartner and that Gartner was talking about for many, many years and continues to talk about, which is what they would call adaptive forms of data governance. Right? And what what that means is is that the governance for one workflow may not necessarily be the same as the governance for another.
And when I use the word governance, I use that broadly. I’m not talking about security and access. I’m talking about things like data definition. I I view data definitions as inherently a governance policy.
How I define something to to me is is inherently governance policy.
So what you described is very adaptive, meaning the answer in one use case may be different than the answer in another use case or for another user or for a different business outcome. That’s that’s a very, very different way of approaching the world, but it’s actually not.
Right?
Like, I’ve I’ve I’ve I’ve wasted wasted maybe isn’t the right word.
I’ve spent days and days and days of my life locked in conference rooms trying to get people to agree to definitions of customer or of churn or of prospect or of lead. And where you try to bring finance and marketing into the same room and have them agree on a single customer definition and it never works.
It never works. And what you just described was is that, hey.
We have found three viable definitions of churn. Here are the parameters, perhaps, attributes of of one definition. Here are the other attributes of based on the metadata that you that you have.
And you’ve got an interface to allow somebody to pick. What what would what’s the context there? Is this is this in creation of a dashboard, in creation of an analytics? What I’m I’m somebody who is who is searching the metadata catalog for information about churn.
How do I then take the next step? Is this I I assume this is just show show me the dashboards or show me the analytics related to the first use case So can be correct. Yeah. Go ahead.
So it can be so it can be I mean, usually the immediate context we would give is the domain, the definition, and which Yeah. Dashboard it’s used, and you are the users viewing these dashboards. The users viewing these dashboards tell you a lot. I mean, if the CEO or my manager or my domain, the people in my domain are actually using this dashboard, this is probably what I need now.
It doesn’t matter that it’s misaligned with the marketing folks. I’m in the sales domain. I need to calculate churn. I need to send a result now.
I’m not going to spend two weeks to align with everyone. I know everyone’s misaligned. And by the way, I get these cynical answers all the time. Yeah.
I know my numbers don’t match with marketing, but I need my numbers.
Yeah.
Give me an AI interface to my numbers.
Yeah. And that’s what we’re trying to solve here, the way the business users experience that.
So I the the idea that, you know, we know the likely best definition of churn based on what’s being used, I think, is reasonable and I think is is is a great proxy for accuracy.
But when you start your next startup, your your your next one after this one, I’ve got an idea for you. And and the idea is where and maybe you’re already doing this. So may this may this may already be true, but where it’s not just usage that is the the proxy for accuracy.
I would love to see a world where we are able to scan metadata and we’re able to scan transactional data and we are able we, through AI, of course, are able to know this definition produces the best business outcomes. This definition was had no errors produced in any pipeline. This definition had what was we know actually is the right one based on our KPIs for for customer churn. Do you know what I’m saying?
Like, we’re looking at transactional data.
So so I think what you’re describing now is related to how you leverage AI. Yes. Like, I can ask JetGPT, you know, I wanna draft an email. Please help me review my draft and make it more, I don’t know, accurate, professional.
Okay. I’m trying to convey a message. I want the best way to convey this and this. These are my content points, but please create three versions.
That’s a different way to talk to AI, right? You will get different results. Same business users will start from asking questions, but with time they will ask, I want the churn definition that will produce the best numbers in my report.
Then give me the context also of who created it and who it’s used so that I have credibility because someone will kick me out if I just show something that’s but this is also about how you ask the question and the layer of the agent, not the context, how advanced it is. I think Claude, Gemini, they’re doing a great job there.
Being able to have a business conversation. Yep. And teach the business user to ask the question the best way. And then, yes, we can run all the churn.
We can surface all of them. We can run all the queries. This shouldn’t be I told you, we can pull all the options with all their context instantly and give them with the queries to the agent. So it’s not going to be a performance challenge, it’s going to be a conversation challenge in the LLM phase.
Okay. So one use case where I would kind of see this idea of of looking at and and scanning more transactional data is around the idea of data quality. Right? Historically, we’ve defined data quality rules through, okay, like, no null fields and must be a a varchar, must must must conform to these attributes of the data. And we have this very deterministic, very BI centric view of how we define quality within our data. In the future, where I see things heading is that quality data will be data that works within our business operations.
Right? It won’t be defined the attributes of quality won’t necessarily be defined through a technical lens and attributes that that describe data, but they’ll be described through excellent in business processes. If the business processes are working, if quote to cash is working, if procure to pay is working, if we are happy, satisfied customers, that suggests the data supporting those business processes is good enough.
Do you do you see do you see something like this happening in the future?
Yes.
Definitely. And and I think that, you know, quality for purpose. It’s not just quality. Quality is not an absolute thing. You can have scores. You can have and not just from a technical perspective. So there needs to be a scoring mechanism, which is also a type of metadata, like classification that you do.
So you can take a lot of quality signals and combine them together and give some some sort of quality score for the business. I think this technology that we developed, we call it, by the way, active metadata tags, these classifications. We kind of define a calculation or define a rule, run it, and then it’s it gives live tags to things. Now, these active metadata tags, they have they are version controls.
They can be defined by different domains. Each business can have the AI ready tag that the business uses for their AI agent. Nothing is ever one for all. So I can work in the product domain where I really don’t that much care about accuracy.
Care about statistics. I don’t mind if the number is inaccurate, right? I’m in product. I’m not reporting any financial report or anything.
I’m really trying to improve engagement and workflows and so on. So I have my AI AI ready score, and it’s different, and it’s fine. And that’s the kind of flexibility we’re also introducing.
So you’ve used the phrase AI ready a few times, And I was gonna ask you about that earlier, but I fell down a separate rabbit hole.
You’ve used the phrase AI ready a number of times. You’ve used the word certified data a few times. You’ve mentioned the medallion architecture a a a couple of times now.
What does AI ready mean to you? What and and how should CDOs be thinking about this concept of AI ready? Is it is it data that has gone through some idea of a certification process or bronze, silver, gold? Is it data sitting in the gold medallion layer? What what is AI ready data, Sarah?
It’s another Gartner term, by the way, which I adopted from Gartner. Active metadata as well.
Yes. Yeah.
Yeah. Yeah. I know. I know. Well, I’m working with them, obviously. But I’ll tell you, AI ready is different for every organization.
I don’t think there’s an absolute definition for AI ready. Health care organizations, banks, financial institutes, startups, software services companies, they will have completely different definitions for AI ready because of the purpose. A bank would be mostly worried about compliance, about accuracy, about reporting the right numbers, about privacy. This will be the essential thing.
So you would probably not want to expose to AI any sort of sensitive data, nothing whatsoever, because that’s really too too dangerous. You need your lineage to be, you know, like six sigmas in the science world. It has to be accurate. You cannot make mistakes.
It’s a financially instituted, AI ready. And, you know, one of the things so you can talk about the big magic, but sometimes the problems are very simple.
Engineers and data platform teams work so hard to organize a warehouse, to build things with following the medallion architecture, but then in Tableau, it’s a wild, wild west. Every analyst picks whatever they want, and then, you know, they they are responsible for the governance of the data, the data platform, but they cannot really own that or be responsible for that because they’re not even querying those assets that they’re working so hard to clean and govern. A very basic rule says, if it’s querying those assets, it’s AI ready. We’re investing and there is a benefit to this investment. You get AI. And we’re encouraging you to collaborate with us because then your business users work so much faster. There is immediate impact and and and and benefit.
And and I think and and you can build things everywhere. It’s not going to be accessible to AI. So it’s also a way to build a discipline to help in the end, the problem is always people. Right?
I mean, we can build the best technology in the world. The problems are always how to engage people into that, how to change the way they work. And we’re trying to find good drivers, not just policing. Governance is always always perceived as policing, as vetting, as preventing.
We need a governance that really empowers and enables. So if it’s AI ready, you get it in AI, everything moves so much faster. If it’s querying our government model, our trusted tables, then it’s AI ready.
It can be if it’s built on views. I’m giving you a level of freedom. Build views on top of this goal there. That’s also fine. It changes between organizations, and that’s the flexibility of these custom rules.
So to paraphrase, and which I completely agree with, the concept of AI ready is not a binary. Right? It’s not Right. It’s not deterministic. It’s not a it’s not a binary.
It is or isn’t. For one use case, it may be, and for another use case, it may not be. Right? And that Exactly. That’s getting back to we were talking about this in multiple definitions of churn. Right?
And we’ve been talking Right.
Through the lens of of fit for purpose.
This this is this issue of contextually based governance policies, contextually based quality rules, contextually based definitions.
Like, that that is an inherently probabilistic world.
That’s exactly where we’re going. And this, I think, this is one of the bigger issues that I’m seeing data leaders really struggle with, which is making this jump from this deterministic rules based, very binary, garbage in, garbage out. It’s either good quality. It’s a bad quality.
Like, making the jump from that way of thinking about the world to to this other way of thinking about the world. And that’s hard. And one of the reasons why it’s hard, I think, is because historically, we’ve had, you know, just one set of rules. Right?
You’ve already talked about the single version of the truth. Right? The one set of rules to rule them all. And we’ve had a hard time with just getting coming up with one.
Yeah. Now we’re talking about n n sets of possible rules, which is a totally different way of thinking about governance. And I don’t know I don’t know how we get to that state without a heavy reliance on AI.
So first, I hundred percent agree. I think there is a common joke about physicians that they don’t understand statistics.
I think data people really will really I mean, they do they run statistics all the time. They run analysis. They are experts, but, like, there is something inherent that needs to change. Everything will be probable, and this is why everything that’s happening will be analysed by agents and will drive actions that will be executed by agents.
Let me give you an example. We will have agents getting lots of queries. We’re going to monitor the usage of these agents. When we see repeated queries or similar repeated queries, we will generate a PR to the agent that builds metrics, and this agent will build the metric.
This agent will run an analysis of conflicts and inconsistencies compared with the other metrics because we will not want to contribute something that adds to chaos. But this is also statistical. It will have some level of sensitivity, some level of specificity. Nothing will be a hundred percent.
But that’s how you can if you deploy AI driven processes like that, you can catch up with the speed at which everything is going to be created. Otherwise, you’re lost. You lost it from the start.
One more comment on that. Even today, it’s never one hundred percent accurate. You always go and find an analyst or ask over Slack, How did you calculate this? What are these numbers?
Do have gaps? You don’t understand how I mean, it’s never one hundred percent. So, when you’re not confident, you can ask your agent, and you don’t need to look for someone available, get a priority, get them to look into that and research, and the agent can help you. That’s already a completely different world.
You can do things in minutes. But, yes, everything will be statistical.
It’s Well, you know, I just had a bit of of an epiphany.
And I was thinking back to a couple of decades ago when a much younger Malcolm was just entering the world of data and analytics, and he was responsible for producing some metrics that was going that were going to our CFO.
And I remember the CFO asking me specific questions, and my answer to those questions was basically, well, it depends. It depends. It depends. That was a bad day for me.
And I know that when data leaders try to do that, things don’t end well. Right? C level executives don’t wanna hear, it depends. They want an answer.
And I think maybe it’s years and years and years of conditioning, particularly for people of my age. We’ve been conditioned over over years to be very specific and be confident and give trustworthy actionable insights to people and don’t be wishy washy. Don’t don’t be, well, it depends. Well, who’s asking? In what context? Even though that’s the reality, that may be the reality we’ve been taught for years and years and years and years to be very, very determinative. So so I think this this kind of reminds me of something I’ve been saying often over the last couple of years is that what everything we’ve been saying, Sarah, is that the world we live in is contextually bound.
Right? And but with and and you said there’s no single version of the truth. I agree. However, the notion of truth is contextually bound.
And within any one context, there can only be one There is.
Of truth in that one For sure.
Yep. For sure.
So the the the challenge is how do we do both? Right? How do we when we’re managing our data as data leaders and we’re we’re doing governance, we’re doing quality, when we’re building our technologies, how do we acknowledge the reality there’s multiple contexts? And and we just that’s our the entire discussion that we’ve been having for the last forty five minutes about multiple contexts. How do we do that? But then when it comes to providing analytics, providing insights, building models, be more confident in what we’re that what we’re providing to this consumer is accurate and is trustworthy and tends to look a little more deterministic when under the covers it’s been very, very probabilistic all along. That’s that’s that’s hard.
You had a post on on LinkedIn this morning that I responded to, and and you you basically said, hey, CDOs. You need to figure AI out pretty quick because if you don’t, you’re gonna lose your job to somebody who knows AI.
I I yep. You agree?
This is exact this is exact and this is exactly the answer to your question.
Yes.
And and and and let me let let me give you an example and then, you know, go back to the CDOs.
When people have When I speak with organizations that have experience with AI, that they actually deployed AI for something, for the product application, for the customer facing interaction, they have experience with AI, they get it. They understand the importance of context, and they even understand the importance of kind of a contained context that is relevant to this domain or to this, you know, product space.
And with them, they understand that there is a role which is essential in the AI world, which is kind of configuring, controlling the control plane of the context. Okay, let’s call it that. And data leaders no longer need to build pipelines, agents can do that. To build metrics, agents can do that.
To build reports, agents will do that. Business users will do this with their agent. They will tailor the report and add everything they want. But they need to control these context words with rules, with, you know, with architectures, with decluttering activities, with with lots of tools that platforms like Euno will give them.
And once they figure out they are the key enablers to AI, and if they don’t do this, someone else will take the budget and do it and hire people like me to solve it for them. If they don’t step up and keep being busy migrating, refactoring, building manuals, semantic layers, trying to create this one source of truth, they will really become irrelevant. And maybe the simple way to say that, no one, no CIO, no CFO, or COO will invest in a governance initiative that is not directly tied to AI.
You’re doing governance for the sake of governance, it’s not interesting anymore.
Too many CDOs were a bit cynical about AI in the beginning. Oh, everyone talks about AI.
Are are by the way, they are completely aware of the level of chaos. They know exactly what this agent is going to face. The the business users are not aware.
And instead of taking this cynical attitude and being passive, this cynical attitude needs to be transformed into action and being proactive and become the leader. They can, you know, hold the flag of AI for the data.
And it would be super interesting to see what happens with that.
It’s interesting you should mention this kind of cynical mindset. Couldn’t agree more.
I agreed so much I wrote an entire book. My book, The Good Hero Playbook, is about the mindsets needed to go from this place of cynicism and, you know, hey, done that before, tried that before, it doesn’t work. We this is the way we do things because this is how we’ve always done things, which is very much the status quo. And the mindset needed to transform your organization, to transform your career, to transform the way you think about your role and your customers and your data. That’s entirely the subject of the data hero playbook, a little sales pitch for my own book, if I can since it’s my podcast.
Yeah. One one last question here. Oh, and by the way, another another editorial point. One, changing mindsets. Two, a lot of my a lot of the the people that I respect greatly on LinkedIn, Ola Wilson Bagneaux, wrote wrote the book on on on on metadata management.
It has this idea of Meta Grid.
Few other very extremely smart people that I interact with are all calling twenty twenty six the year of the ontology.
I agree. I don’t think that goes nearly far enough. Right? Everything that we’ve been talking about for the last forty five minutes, there’s there’s some idea of a knowledge graph, which is basically the representation, the the the representation of an ontology, taxonomies, hierarchies, do control definitions, glossaries, all of these things. If you’re a CDO and you’re not thinking about semantic layers, you’re not thinking about metadata management, you’re not thinking about how to integrate MDM, we haven’t even talked about that. Maybe I’ll do a separate podcast about the integration of MDM into that environment.
Yes.
Because everything we’ve been talking about is extremely valid at an object level or but at a record level, individual record level, there’s still a role for MDM there to know that John Smith Right.
Of course.
Two different things.
All of these things, Call it a semantic layer.
Call it you could call it a data catalog, but it goes farther than that.
You can call it an ontology layer.
If you want You know? One of your competitors calls it a context layer.
Call that if you wish. But that, the thing that we’ve been talking about for the last this this intelligence layer that will allow you to understand context in a very, very deep and meaningful way, if you’re a CDO and you’re not thinking about that, you need to be. This is this is this is foundational to where we are going. It’s the unlock between structured and unstructured data.
Right? If you wanna get value out of all that structured data do you agree, Sarah?
You have to. Of course. Of course. It’s the same. It it’s the unlock of unstructured data as well.
Yes. And let me complete that. If you are an an AI leader, a CAIO or a VP AI, and you’ve got, you know, a huge budget to deploy AI, and you think you’re going to solve it by running a series of pilots with twenty tables or thirty tables that are going to work, I guarantee you. You can you don’t even need to run them.
They’re going to work. LLM works.
Then don’t waste your time on that and talk to your CDO about context. I mean, it works both ways.
Yeah. And also a big area of focus for me in the coming year will be talking about how we integrate knowledge management as a discipline into data management. This is a whole other area that I think we absolutely need to be focusing on, probably the subject of multiple podcasts in the coming year, which is there’s in a lot of these companies, there’s a whole other team sitting out there managing content management systems, knowledge management.
Yes. So, yes. Yes. Yes.
Managing certain Yes.
Another it’s funny because the same, you know, I I built the technology and the same concepts concepts apply to these areas as well. Yeah. Technology would be very similar.
Yeah. And and I have to add one comment, I mean, before we can do it because you mentioned We also call it I mean, I know we differentiate by saying we’re doing an automated context layer. But everyone now calls it context. They just published a guide about how we transition from the data catalogs era to the context era.
That’s the transformation of AI. And I think and, you know, we talk about the impact, about the value, about what we’re trying to solve. Technology is key here because everyone wants to say that their metadata platform is a context layer, but it’s not. You need to have the right technology stack in place if you want to be able to surface metadata insights in real time to agents that are actually relevant and that can actually help business users.
I think in the industry, we’ve built Unile for that purpose, so we’ve built our technology for that purpose. There are lots of legacy or more traditional systems that need to go through a massive transition now, and this will take time. No one has time, so my maybe, message to companies, to organisations, to enterprises: don’t wait on AI. You can start deploying now because if you wait, your competition will beat you.
It boosts. It accelerates everything.
So not like sitting and waiting for traditional platforms for legacy systems to catch up. It will take them two years.
Yep.
Just adopt innovation. It’s another area where CDOs find it hard sometimes. I mean, I’m comparing to CISOs from my Yep. Background. Adopt innovation because AI is the new world, you need innovative technologies and an innovative stack to catch up with that.
Totally and completely agree. Okay.
Last last question. This gets back to Yeah. Maybe maybe one of the first two or three questions that I asked you nearly an hour ago, which is managing metadata.
There’s a lot of it out there. There’s there is a lot and a lot, and then there’s even more. And when you think you found it all, you haven’t. There’s even more after that.
Where do I start?
Right? As a as a CDO, I’m like, okay. You know what, Sarah? Totally get it.
Totally agree. Malcolm, you’re you’re making some sense. I need to focus on figuring out context. I need to focus on building some rules on top of that.
I need to to to make this pivot. But, man, I’m overwhelmed with options. Where should I start? How would you answer that?
To be honest, I think lineage is a very good starting point. I know everyone talks about lineage, but having proper lineage in place is the foundation for everything. You tie to the lineage your unstructured documents, pieces, glossaries. You tie to it usage.
You tie to it logic, and you connect it. If you have a solid lineage in place, that’s the critical starting point. And by the way, it’s true for not just for the data stack, for many different cases of AI. That’s the foundation for the search, problem, to find things.
I would start with lineage. I would, you know, look back at the lineage I’ve got and ask myself, do I trust it?
Is it updated?
Would I rely on that? Or is it just something I’m holding for the audit log for compliance reasons and so on?
Yep. I would add so so totally and completely great. Lineage is going to be a critical component of explainability.
Right? It’s going to be a critical component of explainability. And everything that we were talk we’ve been talking about, which is this contextual based, you know, do this for user a and do this for user b or use case.
There’s rules that are gonna be baked into all captured by the lineage.
Right?
Right. And and lineage is is exactly a little piece, an important piece sitting underneath all of that. Right? So so I agree. I would layer on in terms of the where do I start.
Start with a high value business process where where there’s somebody on the business side Yeah. Yeah. There’s somebody on the business side that is complaining loudly about who knows. Right? Onboarding new suppliers.
They’re they’re taking too long to issue a sales proposal. I I don’t know. But you are out there. You’re working with your customers every day. You know the ones that are complaining the loudest. Yeah. Find those.
You have to find you have to find a partner on the business side that suffers from that and that wants independence. That AI matters to them. They use AI, and they want AI. They will get you the budget, the attention.
They will spend time validating that. Yes. Go That’s your champion. You have to get a business champion.
Otherwise, it’s it’s it’s work wasted.
Go where you are wanted, and yep. Completely, totally agree. A in a perfect world in a perfect world, the angels will sing, and we’d be working on an enterprise wide use case Yeah. Enterprise wide funding, enterprise wide mandate.
Forget about it. Go where you want it. Go within an individual business function and individual domain. Gonna be far easier to execute.
And when you drive that value for somebody, maybe your chief revenue officer, she or he is gonna be excited and tell all of their friends. And then everybody’s gonna come to you and say, I want some more of that.
Yeah. And then you land and you expand and you Exactly.
Love it. Alright, Sarah. It’s getting late in Israel. Thank you so much for sharing your wisdom, your insights. This is such an exciting time. I’d love to come back in another year maybe and and do another version of this conversation to see where we are.
Yeah. That’s really great.
Progress we’ve made. We’d love to run into you at one of the industry events, hopefully, sometime in the United States over the next year. But really, really appreciate you coming out today.
Thank you. Thank you so much. I really enjoyed that conversation.
Me too. Alright. With that, if you haven’t already if you’ve lasted this long and you haven’t already subscribed to the CDO Matters podcast, please do that. Please thumbs up. Please like. Please do all the things that the algorithms in the social media platforms like if you don’t mind.
Look forward to producing a lot more content. By the way, happy New Year. This will air likely towards the end of January where it’s already twenty twenty six. We’re still in twenty twenty five recording this, but happy New Year.
Thank you for listening. Thank you for downloading. Thank you for being a member of this growing community of CDOs and people who want to be CDOs. With that, I will bid you adieu. Thank you. Happy New Year, and we’ll see you on another episode of CDO Matters sometimes very soon. Bye for now.
ABOUT THE SHOW