Culture
Data Management
Data Professionals

The CDO Matters Podcast Episode 80

Data Observability Demystified with Barr Moses

X

Episode Overview:

In this episode of CDO Matters, host Malcolm Hawker talks with Barr Moses, CEO of Monte Carlo, to break down what data observability really means and why it matters. They explore how leading organizations are using it to catch data issues early, drive trust, and scale reliability — plus where the space is headed next.

Episode Links & Resources:

Data observability to me has has got some really exciting stuff going on around it, and there’s no better person to be talking about this topic with than Barr Moses, CEO and founder of Monte Carlo. Or thank you for joining us today.

 

Thank you so much for having me. I’m excited for the chat.

 

Alright. I’m gonna ask you a couple layups first. Alright? We’re gonna we’re gonna we’re gonna do, like, data observability one zero one kind of stuff. We’ll we’ll we’ll start with the basics. You’re gonna explain to our amazing audience what data observability is, why we should care about data observability, the value that it brings to organizations, You know, what some of the things maybe you’re seeing out there, and then we’re gonna go deep quickly. We have a limited amount of time today, and I really wanna focus on where you see things going because that’s what I’m most excited about.

 

Now let’s start with the basics. Data observability one zero one, what can you share with the audience?

 

Awesome. So well, first of all, share with the audience that I’m very excited about the time for data and AI in general. And, you know, when when sort of the story of of Monte Carlo or more broadly the story of of the category, data, data and AI, observability, actually, you know, dates back to a few years ago.

 

If we think about how the world has changed in the last decade or so, maybe ten years ago, a very small number of people were building data products, and they had a lot of time to make sure that those data products were accurate. I’ll give you an example. Companies would maybe report numbers to the street once a quarter, so there were a couple of analysts, typically under the finance or IT team, who were looking at reports and making sure that whatever the company is reporting to investors once a quarter, the number is accurate. And so you had a small amount of data, a lot of time to make sure the data is accurate, a small number of people working with the data.

 

Today’s world is a vastly different world. Today, we have very different people working with data, whether it’s different people in data, data engineers, ML engineers, data analysts, data stewards, all the way to the business units, so sales, marketing, finance, product. Everyone is working data and AI. Everyone is building data products.

 

And in that world, the reliability and the trust of the data matters even more. And so, Monte Carlo’s mission is to help accelerate the adoption of data and AI by increasing trust and reliability of that data because we believe that’s the number one barrier to actually adopting data and AI. And so how how do we do that? Or, you know, what’s what’s actually the you know, the the way to do that is by introducing this concept of observability.

 

Now observability is a concept that’s actually well defined in software engineering. We do not make it up.

 

You know, software engineers have been have been doing this for a long time. Basically, whenever, you know, an engine an engineering team builds any application or infrastructure, you wanna make sure that it’s reliable, commonly known as having five nines of reliability. Right? The engineering team always measures, like, you know, I have ninety nine point ninety nine percent availability for my application.

 

We’re taking the same concept and applying that to data and AI teams. So when data and AI teams ship a product, do they know if it’s actually reliable, if it’s accurate? In the past, we’ve used hopes and prayers. We were just shipping the product and hoping that it’ll be accurate and that if it’s not, someone will let us know.

 

Today, we can no longer do that. Now why not? You might ask yourself, why does this matter? Like, why do we even need this?

 

I think one of the cool things are I don’t know if this is cool, but if if you, you know, kinda look at the progression of the stakes at hand for data and AI teams. So, you know, if I think about the early twenties, one of the I’ll kind of mention a couple of examples where data reliability or data trust was core fundamental to businesses. So I’ll take us back to I believe this was twenty nineteen or twenty twenty. Unity is a gaming company.

 

They had one scheme of change related to their ads data, actually, in in, sort of, the App Store, and one scheme of change caused them a hundred million dollar loss. Their stock dropped by thirty seven percent. It was a colossal impact on the business because of one scheme of change. For folks who don’t know, a scheme of change is basically a change in the structure of the data, one way to think about it.

 

Fast forward a couple years later, Citibank, is hit with a a more more than a hundred dollars, I think it was four hundred million dollars fine for bad data quality practices.

 

And so, you know, more money, more reputation at risk, more, you know, financial, fiduciary risk, the stakes get higher and higher. Now if you look at the age of AI, the implications of getting your your data or AI products wrong are so much higher. So give an example for just the last few months.

 

Chevy Tahoe was selling cars online, and a user convinced the chatbot to sell the car for one dollar and actually was able to buy the the car for one dollar. And so, you know, kudos to the user who’s able to to to do that.

 

But, you know, obviously, problem for the company. Right?

 

There’s a lot more instances like that, where actually, as companies are putting data and AI in the hands of consumers, they risk their brands, they risk their reputation, they risk revenue. There’s regulatory and financial risk that’s based on the reliability of your data and AI products. So the goal of observability is to help make sure that everyone that’s working with data is aware of when data is wrong and why, and it can help troubleshoot and fix that quickly.

 

Uh-huh. Okay. So one more clarifying question before we start to get into the maybe the two zero one of data observability here using a university metaphor.

 

You touched you touched on reliability, accuracy, just kind of broad general data quality.

 

I’m a CDO, and, you know, I’ve got a lot of investment choices to make. Right? I hear that data quality tools are important. I hear that MDM is important.

 

You’re telling me that data observability is an important piece. How does data observability fit into a broader ecosystem with those other tools? Is it do you compete with MDM, or do you sit beside MDM or data quality hubs? How do all these things fit together?

 

That’s a great question. And, look, I have a lot of empathy for for data leaders today. There’s a couple of things that are happening. First of all, you know, we’ve talked about the stakes being higher, which I think is overall good.

 

Right? You wanna be sort of closer to the business and closer to things that matter. Like, what we do matters is good. You know, I think the second is there’s there’s a lot of pressure to keep up with the change.

 

It feels like, you know, you can’t there there’s, like, no there’s no rest. Like, there’s no dull moment.

 

Right? At any given moment, you have to be on top of the change and have to be sort of innovative and forward thinking. And the I think the third struggle that that leaders have is there’s just a lot of vendors out there in the data space, and it’s really hard to understand what’s what, really understand what’s valuable and whatnot, and it’s really hard to not have sort of shiny object syndrome. Right?

 

So So I have a lot of empathy for for folks who sort of are struggling with that. You know, I think when it comes to, where we see folks, you know, think, you know, where folks are on their journey when they start thinking about data observability is when they actually realize, either through having an incident or just having enough experience in the space to know, that the data that their that their team is using or that their business is using actually matters. So, for example, if your CFO is sort of overlooking numbers, like, let’s say this is, like, sales commission data, sales team data, for example, that’s really critical data.

 

So just, you know, just kind of as as an example, you know, Monte Carlo works with, you know, over five hundred enterprises across all industries. And I’ll just give you a couple of examples of what does, like, what does it mean to have data that matters? So, for example, we work with the top airlines.

 

If you think about the data that airlines have, you know, they will look at, you know, which flights are leaving today. And for Malcolm, like, where’s your where’s your bag? Where’s your luggage? And what’s your connecting flight? And did you make it on time? Or are you late? And that information, it’s critical that it’s up to date and that it’s reliable.

 

You know, another example, you know, we work with top companies in media, in retail, in finance, in health care. In all of these instances, you know, people are making decisions, whether it’s, let’s say, you know, clinical trial decision.

 

You know, you’re collecting information about, you know, human lives and and making recommendations based on their health. In those instances, you wanna make sure that the data is reliable. And so just to put this into context, oftentimes, CDOs and data leaders will realize that once they are giving access to other teams and and using data in a way that’s somewhat external facing or that has sort of stakeholders making decisions based on it, it’s important that the data is high quality. I’ll just make a, you know, specific distinction.

 

You asked about data quality versus data observability. We really view for we sort of view this as, like, a three phase approach. I think in the past, you sort of called this you could call it, kind of, like, data quality, and that was sufficient. I think data quality, to a certain degree, is like a moment in time and and not very dynamic.

 

So, you know, you wanna look at completeness and accuracy, sort of the six dimensions of data quality. I think it’s super important.

 

What data and AI observability does is sort of takes that to the next level and helps you answer not only what was wrong, but also why was it wrong, should I care about it, and how do I fix it. And so I think, you know, data quality is really focused on detecting issues, which again is really table stakes. I wanna know if something’s broken, something’s wrong in the data, wherever it is across my data and AI state. It’s very important.

 

But data observability helps you answer, well, I know something is wrong, but who’s impacted by this? So, for example, let’s say, you know, I have a dataset of, you know, clinical trials that I’m expecting to receive this morning, and the data just didn’t arrive at all. You know, if no one is using that data, maybe it doesn’t matter. I don’t need it.

 

Maybe it’s not, you know, it’s not actually being used. But if that data is being used by a particular team downstream, I need to know that immediately.

 

I need to be able to alert them, and more importantly, I need to understand what’s the root cause and start to be able to understand what’s the context around this incident to help resolve it. And so I really view sort of data observability as sort of, you know, the next generation, if you will, that’s encompassing data quality, but takes it to to the next level. Does that make sense?

 

It does. Maybe I could use a word here.

 

In my head, I was thinking forensics. Right? Like I’m thinking CSI data.

 

Right? And a data quality hub Okay. Fantastic. Great. You got a data quality hub. You can define some rules in the data quality hub that that are arguably static, but and are arguably highly deterministic, but can run at scale.

 

Right? And we could kind of use that as a first layer of triage perhaps on just about everything. Yep. But what you’re saying is is that not all data is created equally, that there are critical data elements that you would want to have a second level of insight and oversight or observability on because it matters more.

 

Right? It is it is the data that we have identified that matters more. And that maybe that it’s not just this the kind of deterministic pass fail on a on a on a on a rule that maybe it goes a little bit farther than that. Maybe it goes into lineage.

 

Maybe it goes into access and permissions. Maybe it goes into downstream consumption patterns. You’re nodding. Am Am I am I getting this right?

 

I’m not first of all, I love CSI forensics. I I Yeah. Okay. So totally take that. Yes. Yes, Anne. Let me build on that.

 

So, you know, when when we started the company, a lot of what we do is is sort of creating the category of data and AI observability. It’s something that’s, like, part and parcel over for company identity. But we can’t do that alone. Like, nobody cares about what we think, to be frank.

 

Like, you know, we wanna we work with our customers to do this hand in hand. Right? And so, like, there’s hundreds of data leaders who’ve partnered with us on this journey. And from all of my conversations with these data leaders, I think at the beginning, we used to think that we’re all sort of like special snowflakes.

 

But even software engineers who think that there’s special snowflakes, there are common reasons for why applications break down. It’s just there’s common patterns and common root causes. And so we set out to define what are the common root causes for when data issues occur. And we were actually able to to track that down into four different root causes.

 

And it’s the forensics around these four root causes which help us actually tell the story that you were talking about. So you’re right. Data quality is very you can you know, you need to be able to understand the data really well. You need to be able to define rules and they’re static.

 

And you need to manually update them whenever you get additional data sources. Right? And then once there’s there’s sort of an alert or an issue, you don’t always know anything about that. Observability helps by giving you context into these, into the root cause and and these sort of four different components.

 

Okay. So just to differentiate then from from an MDM perspective, so, you know, MDM also kind of follows this pattern of not all data is created equally. Like, master data is middle of the Venn diagram. The three ring Venn diagram is the stuff in the middle.

 

It’s not just schema, but it’s also field attribute level.

 

But but again, it’s not necessarily trying to explain why something does not conform, why you’ve got a duplicate record. It’ll tell you you’ve got a duplicate record. That’s great, but it’s not gonna necessarily tell you why you’re in the business of why. Is that does that sound right?

 

That’s exactly right. And I think for many for many teams, knowing so so there’s two parts to it. First of all, making it easier. You can actually automate a lot of what traditionally has been done in data quality. You can use machine learning and AI to make that easier. You know, the really, sort of, you know, writing manual rules for everything kinda doesn’t scale anymore. Not in today’s data in the United States, especially not if you’re if you want to build for a world of cloud and AI, it’s kind of it’s not gonna it’s gonna break.

 

And second, we can answer the why. And and let me just elaborate a little bit what I mean by the why. I talked about these sort of four components for the root cause.

 

You know, if if we think about, sort of, the complexity of a data and AI estate, let me just sort of, like, paint the picture for what a common data and AI estate looks like for an enterprise, starting with data that can be third party, that you’re ingesting, basically data sources. Right? So whatever data that’s coming in, it can be, you know, data that you’re getting. It can be, like, airline information.

 

It could be, flight information. Sorry. It could be demographic information about your user. Whatever information you have, you’re ingesting that.

 

And then you basically have, the other part of the the data AI state. Maybe the drive could be, you know, it can have a data lake, a data warehouse, ETL, BI. And at the end of it, you might have an AI application, like a chatbot, you or an agent. You might have a report that’s being used internally by, say, your head of marketing, or it could be, you know, an external data product that’s not AI based not AI based.

 

You know, if I think about sort of the what that map looks like, I said the first thing is that you sort of ingest data. The second thing is there’s a lot of code, so code that’s written by engineers, data engineers, ML engineers, basically code that, you know, transforms that data, processes it. The third is you have systems. Systems is basically like the infrastructure that’s running all of these jobs.

 

And the fourth component is you basically have, like, your models. Right? And the thing is, the core learning that we had that I think is super critical is that for any data issue that comes up, anywhere along that chain or that estate, it can be traced down to one of those four components, oftentimes more than one. So you could either have a problem with the data, like you got bad data or didn’t get data at all.

 

You could have a problem with the code, so an engineer made a change in the code that could be like a bad join or schema change. You could have a problem with the system. So that could be basically like an airflow job that is stuck or is not completed.

 

And the fourth thing is that you can actually have the model go wrong in whatever way. Like, the most common way is a hallucination.

 

Right? And so in order to really have a robust understanding of what does what is going on in your data and the highest state to make sure that the end output is reliable, you have to be able to have visibility into your data, your systems, your code, and your model output.

 

And that completes this forensics picture, if that makes sense. And so whenever there’s an issue, a data team should be able to answer, okay. Here’s a problem. Something here doesn’t look right.

 

What are the data patterns that are associated with that? Did I not receive data on time? What are the code changes that happen around that time? Maybe there’s a pull request that was is associated at the same time that could actually give me clues here to what’s the problem.

 

Maybe there’s a DBT job that failed at around the same time that’s contributing to this. Or maybe, you know, while the even though the context was perfect and the prompt was perfect, the model output was totally off for some reason. Oftentimes, all four happen at the same time, and there’s this, like, perfect storm of things that are going wrong. But it’s really the data and AI’s team’s work to piece all that together to paint the picture and tell the story of why.

 

Okay. Awesome. So let’s go a little deeper.

 

But first, I wanna say that been kinda selling data observability probably a little short because I’ve been describing it as this technology that is gonna allow you to understand when your data pipelines might fail and why. I’m sure that’s a part of it based on one of the four attributes that you just talked about. If some ETL job is is is about to die or will die or why it died, you can understand that. But everything you just described is a little more complex than just this. Okay. You know, we’re a pipeline monitoring tool.

 

You know? So so my apologies for selling data observability a little a little bit short. However, let’s get a let’s let’s get a little bit let’s get a little bit deeper here.

 

I’m gonna I’m gonna ask a question here, but I wanna set the stage cause I want you to kinda understand where I’m coming from with with some of my thought processes here. One of the reasons why I’m excited about this space is because dating back to my time at Gartner and early discussions we were having about what we call the data fabric. The data fabric today is totally different, I would argue, or at least people are describing it differently as just this kind of hyper virtualization layer, which is great. Don’t get me wrong.

 

I’m all for data hub virtualization. That’s great. But I would argue the original version of you know, the original Gartner version of the fabric was something a lot more than that. And the original concept that that was there largely driven by a brilliant human being named Mark Bayer, where where Mark was saying, in essence, hey.

 

The data itself can tell us when it’s accurate and when it’s inaccurate. The data can tell us when it is fit for purpose and not fit for purpose. The data can tell us when business processes are running smoothly or when business processes are not running smoothly. It’s all there in the transactional data and in the metadata. And if we could just turn the AI, the AIs, on the data in a way that is scalable and start building some models to understand what good looks like, we can have the data to start at least informing its own classification and use. There’s I’m paraphrasing myself now. We never actually quite said this, but I’m paraphrasing some of these early conversations around where maybe we could get to some future where governance is highly augmented, you know, augmented or maybe even automated.

 

Okay. You’re nodding.

 

So so I would argue that was the original guard me.

 

Whoever said that.

 

Well, okay. I was just along for the ride, providing little pithy bits of insight here and there, but that’s what I that’s this is where I believe the future needs to go. Right? And I I I believe that data management writ large, kind of data management, we need to be looking at how the data is being used.

 

We need to be looking at transactional data. We need to be able to understand what good looks like because good is inherently contextually bound. Right? What’s good for one, what’s good for marketing may not be good for finance and vice versa.

 

And what’s good for an analytical use case may not necessarily be good for an operational use case per se. Okay. You’re nodding.

 

You The semantic the semantics matter per domain per team.

 

Yeah.

 

So you see data observability playing a role in helping be this this semantic layer on steroids?

 

Great question. So first, I’ll say I totally agree with what you just said, and that’s a lot of what observability is today and is going. And so we the there is a lot that we can infer from the data and the metadata and the context without any manual input today. And add to it LLMs, you can do a lot more.

 

So let me give a couple a couple practical examples to kind of illustrate what you’re saying, but I fully fully agree with you what you’re saying. I think at the you know, maybe I’ll describe kind of a couple of different sort of levels. I think there’s a base level where if we connect today and have visibility to, like, data lakehouse, ETL, you know, AI, BI solutions, you can very quickly, first and foremost, parse out lineage, metadata, and understand very quickly the health of the data in terms of, is data arriving on time? Is the data complete? Is the data accurate?

 

You can do that without any manual input today.

 

There’s actually you you can get pretty far with ML and AI. Now on top of that, I think there’s always going to be, you know, domain or team specific rules, if you will, that people wanna write and, that people should write. And that’s because what’s right for the sales team is not right for the product team is not right for the customer success team.

 

And what’s right for ecommerce is not the same as health care, is not the same as tech. However, what we can do, what LLMs can do today, and we actually, at Monte Carlo, we built this. It’s called a we have a basically, we have a a suite of observability agents. The first agent that we released is called a it’s a monitoring agent.

 

And, basically, you know, up till now, what a data steward would typically do in order to understand what the monitor the first thing that they would do is actually profile the data to kind of understand what’s, you know, the the the context of the data, you know, what’s the, what’s what’s the structure of the data, What does it look like? So that would be the first thing. The second thing, they might actually add some information that they have, you know, contextual about the business. You know, like, let’s say, you know, I know that Malcolm is looking at this report every morning at six AM, so I really need to make sure that the data is gonna be, like, looking good, you know, and accurate before six AM, as an example.

 

Or, you know, this field, in particular, always has to have its credit card field, for example. I wanna make sure it only has has numbers as an example.

 

Basically, what a data steward would do is actually, like, sit down and write a set of rules in that case. What we can do today is we actually have an agent that can do this for you. So it can take if you point it to a particular data set or data asset, it can actually take the data, and through a combination of profiling the data, looking at metadata, looking at semantics of the data, can actually make recommendations for you on what you would want to monitor. I’ll give you an example. Let’s say you’re a sports team. Let’s take a baseball example in particular.

 

Teams sort of analyze a lot of information about players and you know, let’s say there’s a particular field that’s called a fastball.

 

And so, actually, an agent can recommend an alum can recommend that a fastball the values of a fastball should always be, I’m just making this up, between twenty and seventy miles per hour. And so with a click of a button, you can say, yep, I need that monitor, for example, please turn it on. But you don’t actually have a human being doesn’t need to go through the process of actually writing that. You can have an agent analyze that and make a set of recommendations. This field should be between the values of x and y, Or this should field should be monitored for a null value.

 

So actually, agents, today can go can go pretty far and save a lot of time for data stewards in a way that wasn’t possible before.

 

That’s just one example that’s, you know, we just at MoneyCall, in particular, this is already used by hundreds of customers today. We actually have a sixty percent acceptance rate, which means that sixty percent of the recommended monitors that our agent makes get adopted, which is pretty high and we’re pretty happy with. And so, to me, that tells me that, actually, like, this is able to save teams time in a pretty significant way, in a way that I don’t think we’ve been able to do before LLMs.

 

One of the things I’m excited about in one of my conference presentations, I I say I make the assertion that data governance needs to evolve from rules based to exceptions based.

 

Because I I I honestly believe that the the rules based paradigm, this kind of deterministic rules driven paradigm, I I think while good, particularly for legacy BI and for legacy analytics. Right? Because we, you know, these are these these these are structured reports that need structure, that need definition, that need predictability.

 

So I’m not saying that rules are always bad. I’m not saying that. But I’m saying that in the future as we evolve, the only way that I think we’re ever going to manage our data states at scale, get our hands around all of that eighty to ninety percent of unstructured data that’s sitting out there that’s largely ungoverned, is to start being more exceptions based instead of rules based.

 

How do you feel about that? Is that because what I heard you say is you could use an agent to say, okay. Well, these should be the acceptable values because these are what we’re seeing.

 

Or only and you could build a rule around that. Great. Or you can say only act every time you see something above ninety in the case of the fastballs. Right? Maybe there’s a rule sitting underneath it, but the idea of being exceptions based, I think, is is gonna be important as we scale forward in, you know, in in our journey here as as CDOs.

 

Yeah. I mean, look. I think maybe just to clarify, I I generally tend to agree. And maybe just to clarify why, I think one of the you know, this is a very tactical comment, but it’s a real, you know, real thing for for data and AI teams is they get inundated with issues.

 

And and alert fatigue is real. Right? With with any, you know, with with any, you know, any solution that you take on, it’s very hard to adopt if you’re just bombarded with stuff that doesn’t matter. And so, yes, I agree.

 

I think we need to move as data leaders, we need to move to understanding what data needs to be airtight, and being thoughtful about our strategy here for how do we make sure Okay. You know, I’ll give you an example. You know, we work with a Fortune one hundred b two b company. They built a chatbot for the CFO to be able to answer questions.

 

This is a public company. So for the CFO to be able to answer questions about the business for investors.

 

Yeah. Yeah. That data should probably be pretty airtight. Like, you you probably wanna know about any change in it.

 

You wanna know if anyone is making any code change that’s impacting that. You wanna make sure, you know, that all the pipelines are up and running. And here’s the thing. Systems fail a hundred percent of the time.

 

I’ve never seen a system not fail. So the only thing you can do is make sure that you have eyes and observability on it so that you know when it’s happening and you can be quick to fix it. But you can’t avoid having the data being accurate. And so, you know, do I want the same level of scrutiny on that data as, you know, data that’s being used once a quarter by a small number of people for, you know, maybe something that matters less, more maybe more for, like, research purposes, you know, long term and a lot of time to make sure it’s accurate?

 

Probably less. Right? And so I I I also wouldn’t want to respond with the same level of severity to different datasets, if that makes sense. And so, yes, for for a lot of data leaders, you know, this includes being thoughtful about which who has access to which data, what’s the level of severity that each data should be approached with, and what’s the workflow for that.

 

So, again, I think that the world of, you know, kind of writing some rules and kind of shipping them and, you know, and and, honestly, like, hoping that it’ll be sufficient is not. We’re seeing we’re seeing people, like, being a lot more thoughtful about which datasets are used and how and what is the appropriate alert strategy and mitigation strategy that we’re going to take.

 

Yeah. Well, I mean, conceptually, if if you can understand based on anomalies that you’re seeing, if you can if you can understand or even predict that a pipeline is going to fail, for example, you could certainly do the same thing for a business process.

 

Right? You could certainly say, okay. Based on the data that we’re seeing here I mean, people are already doing this in the IoT world around sensor data related to just about anything in the manufacturing space. Right? The people who manufacture elevators know well before the elevator stops running that it’s probably gonna stop running. Right?

 

We could do the exact same thing in in our world where we say, oh, okay. Well, there’s a critical failure here in your procurement process or a critical failure in your sales process, at which point things start to get a little bit more about business observability than data observability?

 

It’s a great question, and it’s one that, you know, at Monte Carlo, our customers have been pushing us on that for, you know, since we started the company.

 

And and I think it’s because a couple of things. One is people always want answers. And so when I see a problem, I wanna know. Like, let’s say I see, you know, sales suddenly spiked up.

 

Yeah.

 

Is that because suddenly, you know, my North Carolina team is crushing it this year, this, this month? Or is it because, a pipeline broke and suddenly there’s, like, an influx of data that I didn’t expect? Or did someone make a code change? Like, I I want to understand.

 

And so, you know, in my mind, in in my opinion, observability should help answer the question of, is this a real business thing or is this actually a problem with your data, your systems, your code or your model?

 

I’ll put and so this is where the forensics really, really are helpful.

 

You know, I think in in today’s world, it’s not it’s not crazy to think that LLMs could help answer that question. Right?

 

I’ll give you another preview of something that we’re working on. I mentioned that we’re working on observability agents. We’re also working on a troubleshooting agent. So the troubleshooting agent is where I get really excited. So so so here’s here’s what teams do today. Let’s say that they see that there’s a spike in sales in North Carolina.

 

What data analysts typically does is starts thinking through what could be the reason. One, maybe we’re just crushing it in North Carolina. Okay. Two, maybe I have a problem with the data.

 

I have to go check the data source. Right? Maybe, like, the team in there, you know, like, something I don’t know. Something that didn’t that’s all the time.

 

Three, maybe something is wrong with the with the with the system. I don’t know. You start listing all these hypothesis, and then you start interrogating hypothesis by hyp kinda like in CSI. Right? You’re like maybe this, maybe this, maybe this. Now that process can take you a very long time.

 

Oh, yeah.

 

Right? For data analyst.

 

Subject to bias, subject to a whole whole bunch of forces. Yes.

 

Exactly.

 

This is where LLMs is actually really, really helpful. So what what what we’ve built, which I think is super cool, is basically an ensemble of different LLMs. So it’s an agent that spawns sub agents based on each hypothesis. So whenever there’s an incident, you take an agent, and then the agent basically comes up with, you know, let’s say, call it ten to fifteen different hypothesis and spawns off new agents to investigate every hypothesis in parallel.

 

So, like, a couple of of agents will look into changes in code. A couple of agents will look into changes in data. A couple of agents will look into changes in the system. A couple of agents will look into changes into models.

 

And then they’ll do that in parallel and then repeat, not just for the particular asset that you’re looking at, but all the subsequent assets that are upstream of that. So you can actually have, like, between a hundred and two hundred agents running at the same time in under one minute.

 

And at the end of it, there’s this basically, like, a smarter agent, like a smarter LLM that takes all that information, synthesize that, and basically spits out a report that says TLDR.

 

Here’s why we think the problem happened. Here’s, like, the patterns that we’re seeing. Here’s what we think the root cause is. And so you can actually take the work of, like, what data analysts what it used to take it in and analyze, like, weeks or months to do and really condense it to, you know, a couple of minutes by building this ensemble of different LLMs doing different tasks.

 

So so so that’s where we need to go. Right? That’s where we, I would argue as an industry, that’s where we need to go. And I’m not I’m not trying to sell for you per se, but this idea that us become these high powered business consultants.

 

Right? Like that we know because we’re assisted by these LLMs, LLMs, because we’re assisted by great technology, because we know the business processes reasonably well, because we have relationships in the business, we put all these things together and we can help the business understand what is actually driving you forward or holding you back or representing a massive risk for maybe from a PII perspective. The list of things here is long, but that’s I I would argue that’s exactly where we need to get to as professionals to become internal Deloitte consultants for business optimization and get away from finger waving and say, Hey, well, the data is broken.

 

The data is bad.

 

Yep. Right?

 

Because you know what? Just because it’s bad in the analytics doesn’t necessarily mean it’s bad or at least not fit for purpose within that operational system. And understanding the nuance there, I think, is is is where we need to take our business and where we need to take data governance.

 

Right? And it it to to me, it sounds like, some of the things you’re doing are are pretty exciting and leading us down that path. So I know we’ve got a hard stop. I could or I could keep talking about this literally for days.

 

There’s there’s a lot to be excited about here. Thank you so much for taking the time to share your wisdom with our audience.

 

Of course. Thank you. This was a lot of fun.

 

Alright. Awesome. Okay. If you have made it this far, please take a moment to like, take a moment to subscribe, let us know that we’re doing the right thing here at the CDO Matters podcast.

 

I’m thrilled that you listened today, and I hope to see you on another episode sometime very soon. Thanks again, Bart. Thanks to all the listeners. We’ll see you again very soon.

Bye for now.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.
Malcom Hawker - Gartner analyst and co-author of the most recent MQ.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.
Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic