Semantic Layers

Episode Overview:

Sanjeev Mohan is an experienced and knowledgeable expert in the world of data and analytics, and on this episode of the CDO Matters Podcast, he shares his insights on the rapidly evolving world of semantic layers.

Join Malcolm and Sanjeev, both ex-Gartner analysts, as they break down the complexities of these new technologies into simple and approachable concepts, and the important role they will play in the future of the data function.

Episode Links & Resources:

Episode Transcript

Good morning. Good afternoon. Good evening. Good whatever time it is, wherever you are. I’m Malcolm Hawker. I’m the host of the CDO Matters podcast. Thank you for joining us today.

You are about to hear a conversation related to semantic layers, probably the data fabric. I’m not sure how we’ll avoid a conversation around AI because everybody’s talking about AI. And that conversation is gonna happen with my wonderful friend, Sanjeev Mohan. Hi, Sanjeev.

Hello. Thank you for having me back on your show.

Yeah. Well, the first time it happened very informally, we were the c y q event at MIT, and you and mister Scott Taylor and myself were having some fun and doing some recording. And I think we did, like, a swag with you or Scott and I, and then we brought you into the conversation.

It was it was a lot of fun, but this is our first formal chat as we’re I I recall the first time I ever met you, Sanjeev, and I didn’t really actually meet you.

I just listened to you. We were both working at Gartner, and you were holding and and I wanna say the year was probably twenty twenty one, maybe twenty twenty, but probably twenty twenty one. And you were holding, like, a a I wanna say, like, a brown bag session or it was like a a lunch and learn type session for a bunch of analysts at Gartner, including myself. And you you were basically explaining how AWS works. Do you do you remember?

I do not. Okay.

That’s okay. That’s okay. I don’t I barely remember yesterday. But but you you were explaining kind of the intricacies of of provisioning storage and provisioning compute on AWS because you you you were an expert in that area. And I remember listening to you thinking, oh, man. This guy is smart.

This guy is both smart and well spoken, and I I need to follow him, which which I ended up doing. And we left Gartner, and we both left around the same time. And now we see each other at events, and I know we support each other’s content. But I’ve been a fan of yours for a number of, a number of years now, and I’m just I’m just thrilled that you could come back on the podcast.

Thank you so much, for saying those kind words.

I’ve cherished our friendship. And, and I really enjoy, every time I get to interact with you. I I do have to say one thing that I’m glad you were impressed with what I was saying. I’m not an expert, and I’m I want people to know this because we live in a, environment where every single day something new happens, and then we are back to the books. So so the the the important thing that I wanna bring to the, to the matter here is that I went against the conventional wisdom of, a Gartner analyst by getting myself certified as a solution architect on AWS.

And, you know, that wasn’t the done thing. It wasn’t a job requirement, but I felt I owed it to my customers if I knew how to be hands on and and deeply immersed in in the technology. And so anybody who’s listening, this my message is that just go out there, do stuff. Like, don’t, you know, just don’t be a passive receiver of information, but actively dive in and, you know, be a participant, be a builder.

I I I love it. I’m a huge believer in servant leadership, which suggests that as a leader, you need to have at least a baseline understanding of what people are doing day in and day out and how they do it. Now does that mean that I need to become a data engineer or a data scientist? No. But I need to know how to have a conversation with those people. I need to know how those technologies work.

I need to understand the challenges of those technologies. So couldn’t agree more. Couldn’t agree more. But before we move into in in deep into our conversation, I I I do want to just let our listeners know, Sanjeev, you’re the founder of Sanjiv Consulting.

So you are a consultant in the data and analytics space. So if you’re listening and you’re a CDO and you want somebody smart to support you in in your data and analytics initiatives, Sanjay is a great guy to do that. Also, you are the host of the It Depends podcast, which is a very successful, very informative podcast. Another great way to get, get information from Sanjeev.

But just please keep those things in mind.

If you are in the market for somebody who is okay.

I I I still call you an expert, but you’re a learner. Clearly, you are a learner.

Yeah. Please do. Yeah. I mean, it’s okay. People call call all kinds of names influencer, expert.

Yeah. But is it Yeah. Please go ahead.

Well, so the thing is and and maybe this is why you and I get along so well, is that I don’t want to be an influencer.

Yes.

Right? I I I don’t I I really don’t want to be an influencer. I’ll be a helper.

And and I want to help people improve how they do data and analytics. I want c I sincerely want CDO tenures to improve.

I want governance to be be viewed as a critical function within in businesses. And I know you think the same way about wanting to wanting to help and wanting to add value. So, yeah, I I can I can go without the influencer title? You and I both.

To me, an influencer is somebody who goes to events, takes selfies, promotes a lot of events, but it’s not really creating content. So you create a lot of content.

This is this is this is an example of the content you are creating. So I think of myself as somebody who’s who’s, like I said, a builder who’s creating content. I’m not hands on anymore because of the role I’ve chosen. So I’m not a developer. I’m not a coder, but I’m trying to connect the dots, come up with like, yesterday, on Sunday, I, published a, podcast on my learnings from Snowflake and Databricks. That was massive amount of work to compile all the learnings, because it was eight days of learning across the two events. So how do you compile that into fifty three minutes is a big task, but I enjoy doing that.

Yeah. Well and I I my myself as well. I I get a lot I get a lot of satisfaction sharing what I’ve learned at conferences because, frankly, those conferences are expensive, and not everybody gets to go to them. Like Gartner? Yeah. I mean, you’re talking thousands and thousands of dollars to go to a Gartner event.

And if I can share anything that I’ve learned and and like what you’re doing, that’s awesome. Let’s let’s let’s get to the topic.

This topic is a result I I chose this one because you and I are actually both going to be at the enterprise knowledge semantic summit in Munich in Germany in October.

Yes. Sanjeev is rightfully the keynote speaker. I’m excited to hear what you have to say. No pressure.

But I do I I’m excited about the idea of of of what I view as a as a semantic layer, but I’d like to hear what you think about about semantic layers. What are they, and where are they going, and what do you think a CDO needs to be thinking about the most?

So so we are moving very rapidly, although, this is not a podcast about AI, but because yeah. So front and center. So what’s happened is that the businesses are in the driving seat.

What, used to happen was that the IT people would come back after conferences and tell business, you know, what is this new graph database or there’s a new indexing technique, and I can improve this query performance.

And I have a new BI tool with some caching. And the businesses’ eyes would roll because they would be like, okay. So what? You know? Am I getting my data, high quality data on time, you know, without breaking the bank is all I care about. So all this mumbo jumbo is not important. But with AI, the businesses now, have, woken up and realized that they there’s so much to do with all their data, whether it’s structured data in databases, which I think they have a good handle, but even unstructured data, PDF documents.

So, you know, maybe I have twenty years of, certification documents, and now I want to ask a question in natural language because we don’t expect businesses to know even SQL. If they know SQL, great job. But if you don’t, we now have the ability to ask questions in natural language. But that question in natural language of your choice, like English, French, or whatever it may be, needs to be translated.

So we we think that LLM speak English. They don’t. LLM speak mathematics.

The probability, pieces of code doing probability of what you’re understanding what you’re asking and trying to predict what comes next. So that translation between what the businesses want and where the IT is is through that semantic layer. So if I for example, if I ask a question, hey. ChatGPT, tell me what is gonna be the performance of this queue in q four of my fiscal year? What is q four? What is fiscal year? Which geography?

See, all these things, we ex like, we expect the semantic layer to grow up to be smart enough to say, your fiscal year started in February first of that year. So q four will take you to to this, period. And now I will use my all the powers I have to run maybe a SQL query behind the scene or a Python script or whatever it may be. But it’s a semantic layer that is the glue between the business and the IT teams.

Man, I I I love it. There’s there’s probably four hours of podcast just what you in in what you just said.

But but but let let let’s let’s peel the onion a little bit.

A few years ago, to me, a semantic layer was this thing that allowed you to link tables together.

Right? You had an account table and you had a customer table, and semantically, they meant the same thing. Right? A customer and an account were were logically the same thing, and I could link them together, And that could link to the the the customer account or the the prospect account, and I could have a single SIG SQL query, and I could hit all fifteen tables at the same time.

And they were logically, virtually connected so that I did and and and semantically would resolve differences in the meaning of how those tables were defined. That’s how I viewed it, like, three to four years ago. But what you just described is a lot more than that. What you just described on on honestly, kind of sounds almost like I don’t wanna oversimplify, but almost like an operating system.

Right?

Correct. Yep.

How do you respond to that?

Yeah. I you know, so a lot of people say, use the term operating system. Especially with LLMs, it’s become a very common thing. Oh, LLM is an operating system. So so it goes back to this whole influencer versus thought leader versus an expert. It all comes down to semantics.

Although we’re talking about semantics, so so I’m using semantics in a different term here. But my my point is that, yes, I mean, you can call it as an operating system, although operating system also does a lot of execution.

And what I explained actually is execution, but this is not not how conventionally people understand semantic layer to be. They think semantic layer is a pie in the sky that will magically do this translation. But what I’m saying is, no. It should do the translation, and then it should figure out where to execute. And if you combine these two, then, yes, it is an operating system.

Yeah. Well, that starts to get into a little maybe data fabric y things.

Yeah.

Yep. And, well, we we we we need to peel that onion as well. But, like, people may ask themselves, well, Malcolm, you’re an MDM guy. Why are you talking about semantic layers?

Well, an MDM is a semantic layer. I I will I it it absolutely positively is a semantic layer. And the only thing that separated semantic layers from MDM was that MDM was applying business rules at an object in a field level, and semantic layers were applying them at a schema level. And and and MDM was doing some additional things like stewardship and maybe some workflow management and some other bells and whistles.

So if you’re asking, well, why is an MDM guy talking about a semantic layer? To to me, these worlds are coming together.

They’re they’re they’re absolutely coming together, and there’s a lot of kind of influences that are starting to coalesce. Another area that that I’m really interested in is an area like our our friend Juan Cicada is is working on, which is using, graph to to to apply meaning, I would argue meaning, which is which is the very core of what semantic means. It’s it’s the meaning. It’s definitions.

How do I how do I define something? Where he’s using graph to allow to inject meaning from rows and columns into LLMs. This is another aspect of all of this. So there’s so many things that are starting to kinda come together here.

And and you mentioned execute, which means some form of decision making.

Yes. Correct.

Right? Okay.

Go go go deeper on that. How is that actually happening? Is this, like, rules based, or is the LLM and AI gonna be doing be be figuring this stuff out? How how do how do decisions get made?

Right. So, so that takes us to, where does semantic layer sit? And Yes. So let’s talk about that.

I also wanna mention, you said, you know, semantics giving the meaning. I think the word that we’re using right now for, it, in terms of LLM, the context.

Yes. What is the context? Etter. Yeah.

Of, you know, my question. So by this is this is a common thing where LLMs are so good at. Like, for example, if I say, you know, I have a question about bank.

You know, by understanding the content, it says, oh, yeah. You’re talking about which is the nearest Bank of America, JPMorgan Chase, or Citibank. But no. In some cases, I actually maybe asking directions to the riverbank that I wanna walk to.

So so that’s why this context is so important, in this scenario.

Now to, to answer your question about, where how does Symanteclear execute, and become more actionable?

So that that question has perplexed a lot of people because the question is, where does a semantic layer sit? Yes. And and that equation keeps shifting all the time. So so we have to peel back, and go back to the history to just get some ideas to who came up with this idea. There’s a fascinating story about business objects.

Two people salespeople, I believe, who worked for Oracle in Spain sorry, in France, were running around trying to educate people on Oracle’s ability to do analytics, and came upon this idea of what if we created a translation layer in, in the UI, called business objects, business objects universe.

And that’s how this whole concept of semantic layer started, I think in early nineteen nineties.

And so semantic layer was something in the UI.

Now the, which is great, like, as long as we had monolithic architecture, we had an Oracle database with a business object.

Sure. No problem. But we’ll talk more about data fabric because I think it’s such an important, topic these days. But what has happened is that our data estate has exploded.

So we’ve got knowledge graphs, graph databases. We’ve got relational. We’ve got unstructured, semi structured document databases.

So we’ve now have a proliferation of where the data is stored. And we have moved away from, futile attempts to consolidate everything. So remember we used to talk about, corporate data factory, enterprise data warehouse, and we tried forever, but we just it’s impractical to expect all the data will exist in one place with one BI tool.

Now even even consumption has moved, because one team may say, we are Microsoft experts. We love Power BI. Another team may say, no. But we we like our Tableaus.

But then a data scientist uses neither of these.

So so now the problem is that having that semantic layer in the BI tool became problematic.

So so so it’s been sort of moving down. Like, should it be in the middleware? Should it be in the database?

And this is, the the discussion. Like, for example, Databricks is a very engineering focused company. They produce amazing products, but they but the concept of semantic layer was was a very foreign concept to them until last year. Last year, at one of the podcast I did with John Furrier for the cube, we actually went to bat and said that database needs to realize that they they’re missing a semantic layer. This year, semantic layer is front and center of Unity catalog.

And and not only and here comes an interesting kicker. Not only have they created a semantic layer, they’ve created an ability to import semantics from a third party independent product like AtScale, DBTE, cube dot dev. So so so the so, basically, right now, with what’s happening with fabric and lake houses, the center of gravity is moving to the catalog as arbitrar of, queries, as a place to mediate queries. And so right now, the thinking is let’s put the semantic layer into the catalog.

So so I hope this answers, but that’s a state of affairs today.

So so much more. We we could we could we could press on here. So I I do agree that the catalog today seems to be the place where some of this logic is is coalescing.

Right?

Yes.

I would I would argue there are still a number of missing pieces around what we used to call at Gartner a recommendation engine or which was a part of the data fabric at Gartner this layer called the recommendation engine.

We’ll we’ll we’ll we’ll talk about that a little bit more because I think that starts to get into AI where the AI where the AIs can start to tell you how to model data. It can tell you the most efficient integration patterns. It can start to tell you what the data quality rules could or should be to optimize anything, whether that’s an analytical process or an operational process. And what I just described is is what Gartner envisioned the original data fabric to be, which which I would argue starts to become a lot like an analytical operational system that would one day also include operational uses of data.

And we start to get into operational versus analytical and and some of the limitations about how we store data and and and disk and and how disks operate and all sorts of philosophical things about about data storage. But as as as a CDO I mean, as a data catalog, the the place where I should be and and and we you and I both know people process technology. It’s not just always technology. There are people and there are processes, and you and I could both talk about those importance of those things for for hours.

But from a technology perspective, is is a data catalog the place that I should be looking if I don’t have one? Is that is that is that something you’re gonna see as a mission critical piece of of an infrastructure going forward?

Yeah. So so let let’s talk about this. In fact, you had even asked me a question earlier, which, I’ll answer now about, like, is it rule based, or how is it changing? And so I’ll give you an example of how, the semantic layer creation is starting to change.

Snowflake Summit also took place this month in June. Actually, by the time we watch this, might be a few weeks late. But in June of twenty twenty four, two pivotal conferences took place, Databricks Data and AI Summit and Snowflake Data Cloud Summit. In Snowflake Data Cloud Summit, we saw, launch of a product which, which looked like it announced, but may have gone GA, Document AI.

And Document AI is a is a ability where the business people can point to a PDF, and, it parses a PDF and creates a semantic layer, and then you can ask natural, language questions on it. So this is the beauty of AI. In AI, you don’t necessarily need to give the rules. You can give the rules in a structured database because it’s already structured.

Somebody’s done the work. But what happens when a new piece of information comes up? Somebody has to go and create rules for that because, you know but at a lens to some extent, if you’ve if they’ve trained on a vast corpus of data, then they have that ability to probabilistically determine what the semantic layer is. Now I can start asking questions.

But the now the so that’s that’s to answer the previous question about should it be rule based or inferred?

And I think it’s a combination. I would not say trust LLM with your eyes closed because we know LLMs. Hallucinate hallucination is not a bug. It’s a feature of LLM.

So so we have to we have to come we have to have a hybrid. But another question comes, okay, dude, great job. You’ve inferred the semantic. Where do I keep it? How do I reuse it?

So you need a place to store it, and that place to me becomes a data catalog.

And there’s a bigger reason, by the way, because data catalogs are no longer just, as one once said, museum of metadata.

I love that.

It’s more than that. Like, for example, I can apply my, policies of who is allowed to see it. See, I can do it at at at at the data level, cell level, at table, column, fields. That’s great.

But then if I were I’ve got if I’ve got data in Snowflake, BigQuery, Databricks, Cloudera, you name it, Redshift, Oracle, SAP, it becomes a lot harder. So but if I can have a business representation, of it at some place, let’s say it’s a catalog, and that’s my semantic layer, then maybe I can apply it at the semantic layer. So I can have a UI rule base that says, if you belong to this group, you are allowed to read this data, but not that data. So the row level, column level security can be applied, at one place.

So that’s why I like the concept of catalog as being the the center of gravity.

Well, it’s interesting because that suggests a form of centralized data management, and I think I think for a lot of people, that creates an antibody response.

Like, I I think thanks to the data mesh and a few other forces that the idea of having one place and one thing and one tool creates creates anxiety for for for a a lot of data leaders. I don’t have that anxiety because, as you mentioned, there were a lot of interesting things that were announced at, Snowflake Summit and and Databricks, data and AI Summit.

Same is true with Microsoft Build. Yeah. Where where at Build, they were talking about the exact same things.

And a big focus for Microsoft is is having a catalog infrastructure. Theirs is is purview.

But running within this data fabric where it’s agnostic to a certain degree of data warehouse or data lake or lakehouse infrastructures where the fabric is is interacting with. Right? Like, you you could have data in they they Microsoft announced a snow a a partnership with Snowflake. I was I couldn’t believe what I was hearing.

Like Yes. They they announced it. And it’s just an announcement.

So, obviously, there’s a lot of work to actually make that happen. Microsoft was already partnered with Databricks, but the idea that you could have data sitting in Snowflake and still be able to bring it into a Microsoft fabric. It’s it’s Delta Parquet format within the fabric and do and do all of the things that you just described from a catalog catalog perspective, data quality perspective, running spark jobs, running SQL queries, running doing anything, and having and doing that from within a Microsoft environment, but where the data is actually technically in a still fake environment. That to me is just like, these worlds coming together. I never thought I would see that, but I think that gives a lot of security to CDOs who may be concerned about vendor lock in, who may be concerned about I’ve got basically a multi cloud environment, and how do I create more of a single cloud experience because I don’t wanna have fifteen data catalogs.

Yeah.

So, Malcolm, I I I was astonished to learn something so basic that it actually blew me away. You know what SaaS stands for?

Software as a service.

But that’s not what business want. You’re a business want?

Everything yesterday?

It it’s actually it’s still SaaS, but it’s service as software.

Interesting. Okay.

So so what why, fabrics are are in such a vogue?

And this is Microsoft fabric. Google has Dataplex.

Amazon is getting into it.

Snowflake database are definitely into this this, converged integrated environment is because businesses have sent out this message that, guys, we want outcomes.

So and some some people are calling it outcome as a service. Someone is saying, no. It’s a value as a service, task as a service, result as a service. But the idea is that it’s not software as a service because you’re still getting software, which is the medium, a means to the end.

You want service as the deliverable, deliver as a software.

So it’s it’s interesting how, like, you know, a small minor thing like that, like, just a change in the acronym, you turn it on its head and you go, yeah. That makes sense. This is why fabrics are in demand because businesses don’t want to spend time cluing together lots of moving parts. And I think this what we are noticing now is definitely technology has caught up with agents. AI agents are becoming big. But although that still hasn’t it’s not baked yet. But I I I think what’s happened is that in modern data stack, we pivoted to the extremes where we had super specialized, micro specialized software products for every itsy bitsy piece of a pipeline.

And and it became, too much for the businesses that they are like, okay. We’ve experimented with this.

First of all, there’s no common metadata standard.

So if I have a HubSpot, which is which has my customer information, and I have a Salesforce, and I have, Marketo and, Jira and all of these things, they all are having their own copy of metadata.

Well, so I love the idea of services as software. Mhmm. And and let’s take the most basic use case that you’ve already suggested, which is I’m a business user. Yeah. Whatever. I’m a salesperson.

I’m in I’m in Salesforce or I’m wherever. I’m in Dynamics. Doesn’t matter. And I’ve got a Copilot, and I’m asking what what what region was the most productive for me last year and why?

Correct.

Right? And I’m asking these questions in a in a in a copilot, and I’m getting I’m getting predictable, accurate, trustworthy results from that chatbot.

Now I think we’re still a a reasonably long way away from that future because of something you something you touched on, which is and and I’ve been saying this a lot in my content recently, which is that today, we do a reasonably good job of rows and columns. Right? Forget rows and columns. Just structured data. Right? We do a reasonably good job at at at at structured data, but these systems are built on and optimized with unstructured data.

So what you what you talked about earlier is this is this you gave an example of a PDF.

Right? And and and applying structure to a PDF.

But that that that jump, that bridge, and this again, we’re right back to a semantic layer, I think, particularly as an operating system. The this bridge between structured data, which is what we do mostly from a governance perspective, to something that is unstructured and has the meaning and the context. And thank you for drawing this distinction between context and meaning. You can infer meaning through context, but these are separate these are separate concepts.

Thanks. So thank you for that. You can infer the context. You can provide additional information.

You can build this bridge between structured data and unstructured data. Today, I would argue it’s really not there.

It’s not You you it it’s it’s not much.

It’s early stages. But I see a world where using software like OpenAI Studio or any of the other tools that you can use to build chatbots, where you just point it at a structured data source.

Right? Like, whatever. Wherever your CRM data, maybe it’s transaction data, it doesn’t matter. And it could and it can and it can consume that structured data as a means to bind the answer to to ground, I think, is the technically correct answer Right. Is to ground or to build a rag pattern on the fly. And me, the salesperson in Salesforce, asking the question, I don’t even know that’s all happening. It’s just happening magically in the background.

Yeah. Where a rag pattern is built on the fly, and I’m hitting the right data. It’s and and I think that’s what the world we need to get to. Do you agree?

Yeah. Completely. In fact, Snowflake announced, cortex analyst and cortex search. And, cortex search is basically where you are you’re doing hybrid, search. You’re doing a vector search and a deterministic, text based search without writing a single line of of, retrieval augmented generation or write code. So we are getting there. So we we have a bigger problem.

So right now, the salesperson cannot go ask a question and guarantee get the right answer. So what once CICADA has done, and his team at, data dot world is is amazing because they’ve shown these benchmarks where you use, a knowledge graph, and then actually cube our depth to get a level further by by putting a semantic layer, and they got hundred percent accuracy. So Juan moved it quite high, and then David, Tilleke, cube dot dev took it to the next level. So combination of knowledge graph and a semantic layer.

They’re very closely related. Like, every semantic layer is a knowledge graph, but not every knowledge graph is necessarily a semantic layer. So there’s a slight difference. But, anyway, without getting into the nitty gritty details, but my point is that that RAG is one way, knowledge graphs is one way.

There’s something else that’s that’s going to become prevalent, which is you’re continuously fine tuning the model with new data as a new data comes in. So all of these things are are just beginning to happen. But I wanna go to the ex example that you talked quite a bit about chatbot. See, chatbot, I ask a question, and I’ll get some response. That if that response is not up to the snuff, I’ll ignore it. But we are moving into a very, dangerous space where we are trying to build these autonomous, multi step process, execution engines based on agents.

Here, there is no human intervention. If there is some, some mistake, hallucination, it is gonna go downstream to everything. So we cannot possibly live in an environment where, you know, the the agent is understanding my context, and based on that, is making a decision. It’s calling, doing a function call, to another LLM, executing, an API. So this whole tag, if you may call it, based on an LLM is where we are moving to. And to get there, we have to solve the accuracy problem because we don’t have a human in the loop in some in future situations.

I I do believe that’s a solvable problem. Right?

And and the way that I look at this is that we’re we’re on a road Yeah.

From manual management to automated management of data and governance as well. And we are early in the stages of what’s just called augmentation. We’re we’re early in those stages. But the problem you just described, I think, is a very, very solvable one.

If my AI is smart enough and it says, hey. You know? And and I and I and chatbot and I ask a question and it gives a response, in theory and if I ask a very pointed question about what should I do, like, should I do a or should I do b, In theory, in the future, you could have AI that goes out and runs Monte Carlo simulations against taking that very action. Yes. Run running scenarios, run running what ifs, running models on the fly that says, if you do this, here is the expected business outcome of taking this action. This starts to incorporate the idea of what Gardner would call decision science.

Yes.

Where where you could actually build full circle from insight, analytics, recommendations. That’s where we’re going. We’re going from insight and analytics. We’re getting to the world of recommendations.

We’ve talked about this. But to close the loop on that is decision science where you could actually model what the impacts would be of a recommended Mhmm. Specific piece of guidance. So I think what you mentioned is a solvable problem.

Yes. In fact, you know, today we rely on prompt. Even RAG is actually a a prompt engineering thing where RAG is actually doing some, retrieval of contextual data to generate a prompt. So, the problem we are trying to solve is is, determining how correct the output is. So so evaluating the output, and that evaluation, we are very rapidly moving into using LLMs themselves to evaluate LLMs.

Yes. Yes.

It’s all using LLM as a judge.

You know? And that’s where we are heading to. And and, actually, I have to say that a lot of people, who are on the sidelines or skeptics of this, I I tell them all the time that, in nineteen ninety four, we were asking what has the World Wide Web done for us?

What’s what’s the point of the Internet? These are all static pages.

Oh, yeah. You’re gonna buy a fridge online. Yeah.

Yeah.

But today, we do everything on it. So so I’m like like, this is twenty twenty four. Don’t be nineteen ninety four. It it’s early stages.

To your point, these things will get better. These are solvable problems. But just like the Internet, you know, it took some time, maybe ten years before iPhone and social, mobile and all these things converged, and then it became the de facto way to order your groceries. I remember nineteen ninety four, WebBank was the hottest startup, and it was not ninety four, like, ninety eight, ninety nine.

And it was the most spectacular disasters of, dot com bust. So we are going to see that happen, but that doesn’t mean it’s end of the road. What it means is that the technology would keep on improving behind the scenes. It won’t be a hype.

And then twenty twenty six, twenty seven, it’ll become the de facto way of how we do business.

I I think one of the and I’d love your opinion on this. I think one of the biggest, that’s not the biggest, but it’s certainly a a gating factor here is the way we think about data and analytics. And something you touched on earlier is is a required what I would argue is a required mind shift. You you mentioned that a a move from a very deterministic world to a probabilistic world where everything in the in the future becomes probabilistic.

I’m really excited about this because we’ve always known that the answer to questions is it depends.

Podcast. Right? Right? Like like, who’s asking? Right? Yeah. What’s the context? What’s the desired outcome?

This is this is relevant when in my world when we talk about versions of truth. And for a long time, we we we still we we clung to the idea of a single version of the truth when in reality, what’s true to marketing is it could be different than what’s true to finance, and they would be right.

Correct.

So the world of this probabilistic world that we’re moving to, a world where context is king, that’s the semantic layer. Context is king. I think will enable and require a mindset shift in thinking differently about how we approach data and analytics. What what would you say to that?

Yeah. So so one thing that I, I have learned over decades of, you know, following the trends and working that we we need I feel that, I need to stop thinking about, is it a or b? It always gonna be a hybrid. So we are combination a hybrid of deterministic and probabilistic.

Why I’m saying this is is because we have a few minutes left, so let’s talk about, you know, data mesh, data fabric. This whole thing about is it centralized? Is it decentralized?

Centralized is is way too constricting, for modern time, so we should decentralize. Oh, now we can decentralize.

So I I think we need to get away from this and just acknowledge it’ll be a hybrid world. And by hybrid, what I mean, to in my mind, what I see as a success is that you centralize at the lowest layer at the infrastructure layer. That infrastructure means you pick a choice, technology of choice. Let’s say Snowflake. Don’t dig around with one team as this, another one as this, and on and on. Who’s gonna maintain it? So centralized centralized even governance.

So that semantic layer because to your point, the different sources of truth, the different understanding, the different context. So so so you have to have a common business glossary, let’s say. But then as you move up to the domains, you decentralize applications.

So the the person in pharma who’s responsible for clinical test knows that domain inside out, knows that data inside out, not the data engineer. So so they need to own, and maybe they’ll have their own federated governance because they have a different understanding of terms than the sales or the r and d or the, you know, supply chain team. So so so you have a centralized, infrastructure, decentralized application, and then you have a distributed, products or data products on top. And by distribute so this this is how I see it. So it’s a hybrid.

Depending on which layer you are at, you have a different, different, point of view or approach.

I I love it. Functional, cross functional, enterprise wide.

Correct.

And and each of them is their own context, and each of them can be right at the same time, and and that’s fine. And you don’t have but if the CEO asks how many customers do we have, there’s only one answer to that question.

So there’s a context at each of those levels, and the things that is going to allow all of those contexts to coexist Correct.

Is some form of a semantic layer. Today, a, for a lot of companies, that’s MDM. It’s also what is traditionally known as a semantic layer from more of a query perspective and an analytics perspective. But all these worlds are absolutely coming together, and it comes right back down to context. When we were Gartner, there was this this this term was thrown around a lot of adaptive adaptive data governance and adaptive data management. To me, that that’s just a way of saying context matters.

Yes.

And acknowledging that context matters.

Yep. Yeah. Yeah.

Alright, Sanjeev. I’ll need I need to let you get back to work because we could keep talking for hours and hours and hours. I I love talking about this stuff with you. This is where the where the world is going. It’s not centralized or decentralized. It’s both, and we need to be aware of context.

Awesome stuff. Thank you so much for for for coming out today and and sharing your wisdom.

Share it. Thank you so much for having me on on your show.

Alright. To our listeners, to our subscribers, if you’ve made it this far, thank you very much. Please take the time to like, maybe even subscribe to the content. We do this every two weeks with brilliant people like Sanjeev and others.

For now, I’m Malcolm Hawker, head of data strategy at Profisee Software and host of CDL Matters, and I hope I will see you on another episode sometime very, very soon. Thanks, everybody. Thanks, Sanjeev.

Thank you. Bye.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.

The CDO Matters Podcast Episode 54

Semantic Layers with Sanjeev Mohan

Sanjeev Mohan

Episode Overview:

Episode Links & Resources:

ABOUT THE SHOW

Malcolm Hawker

Manasia Cobb

LET'S DO THIS!

REGISTER BELOW