The Data Sprawl Dilemma: Centralize, Replicate, or Virtualize?

Episode Overview:

The Data Sprawl Dilemma: Centralize, Replicate, or Virtualize?

Most organizations default to replicating data: copying it from source systems into warehouses and lakes so their tools can reach it. Anu Jain, founder and CEO of Nexus One, thinks that’s the wrong answer. Malcolm isn’t so sure and that’s where it gets interesting.

📌 In this episode:

Why the official count of data sources is always an undercount: shadow data, unstructured data, and legacy systems mean the real number is far higher than anyone reports

The case for virtualization over replication: bring the compute to the data, leave it where it lives, and collapse deployment timelines from months to days

“Information yield” — the metric Walmart once reported publicly and stopped, and why AI is about to force every CDO to bring it back

Why centralize-vs-decentralize is a false choice: the Venn diagram reality of golden records, distributed data, and why you need governance across all of it regardless.

💬 The takeaway: “Think of CDOs as refineries. You have all this raw data. The question is: how are you enabling your organization to extract value and can you put a metric to it?” — Anu Jain

About the host + guest: Malcolm Hawker is a former Gartner analyst, Chief Data Officer at Profisee, Editor-in-Chief of CDO Matters, and host of the CDO Matters Podcast. Anu Jain is founder and CEO of Nexus One, former CEO of Think Big Analytics, and a veteran of IBM Watson and Teradata. He publishes in Fortune and is active on LinkedIn.

Episode Links & Resources:

Episode Transcript

Good morning. Good afternoon. Good evening. Welcome to the one hundred and second episode of the CDO matters podcast.

This is Malcolm Hawker. I’m your host. I’m thrilled that you are joining us today. This is the podcast for chief gate officers or anybody that wants to be a chief data officer.

I’m I’m excited for today’s conversation. We’re gonna talk with Adi Jain, Adi, is the founder and CEO of Nexus One. We’re gonna get into that a little bit.

We’re gonna talk about data sprawl. We’re gonna talk about what Anu calls the the first mile of data. Data all over the place. Data sources everywhere.

And how to kinda get your hands around that, we’re gonna we’re gonna talk about maybe solutions in the market. We’re gonna talk about architectures. We’re gonna talk about data management disciplines. There’s no doubt that data sprawl is a huge issue, and I think that that’s something that, we’re gonna go into depth with today.

So, Anna, with that, welcome to the podcast. Thanks for being here.

Oh, thank you so much. It’s it’s my pleasure.

Awesome. You are in Atlanta. My company is based in Atlanta. You’ve got this interesting history.

You you go back and forth between, like, really large companies having important roles at, like, super large companies. Like, you you buy the services branch at at Teradata. You were heavily involved at IBM, and then you go, like, into founder mode. And then you go back to big companies, and then you go into founder mode.

That’s that’s a bit unique. Do you do you agree? Usually, people will stay one or the other. What what make what what gives you the the the dexterity to go back and forth between these two worlds?

I’m not sure if it’s dexterity or if I’m masochist. Alright. I like like like other, chief data officer here, we’re we’re probably all access to some to some level or degree. No.

Look. I I think the reality is I I’m unique in the way that I’ve had the ability and the privilege of being able to see the world, especially from the data world, from the vantage point of very large organizations, whether it’s with Watson leading large parts of IBM or with, with Teradata, we’re seeing some of the largest data environments in the world. And there’s always given me the, what’s the right word here is it’s just the energy to want to go solve it in a different way. And I think that just so much sprawl out there and there’s just so many tools and technologies and consultants trying to solve this.

Now, if this thought there’s a better way And, you know, what better way than just to go try to solve it?

I shared that.

One of the reasons why I I went to work for Prophecy and why I left Gargert is because I saw so much dysfunction when I was at Gargert. And and I don’t mean that in a judgmental way. And I don’t mean it in a negative way, but being exposed to that world, I saw maybe the better way to say it instead of instead of being negative is is opportunity. Right? Like, there’s just Yes. By thought every day so much opportunity.

And I’m I’m approaching that not by founder mode. You’re not going into founder mode, but in in evangelist mode, spreading the word. You’re trying to get best practices out there. So, anyway, we we maybe kind of share the desire to to fix things.

Maybe that’s what, what we both have in common. But let’s talk about this this idea of, like, data all over the place, data sprawl. I I I was reading a report earlier today from Addresser Analytics, said that on average, companies have almost eighty sources of data. I was like, wow.

That’s it? And I was looking at an IDG report that they did with Metallia in a couple of years back where they were saying that on average, companies have more than four hundred sources of data and that twenty percent of the companies in the survey had over a thousand. Yeah. Think this this to me is is a massive opportunity, but also a a massive problem.

When you hear these types of numbers, what do you think?

You know, I I I think first and foremost, data hasn’t hasn’t been a first class citizen for many of these organizations. Now we talk about this world word of data sprawl, but when you talk about the number of sources out there, it feels like that’s actually underreported. I think what we’re hearing are official sources of data. You know, I, I sense that there’s many unofficial sources of data as well, whether it’s spreadsheets, whether it’s third party SaaS apps that you have out there, across multi multi cloud legacy data that we haven’t even thought about yet.

And all of that information, all that data has value. It has some sort of, you know, information yield, like some sort of value to that information. And I think companies today are all I mean, of all sorts are struggling to get yield out of all of this data that they have. And they’re and they’re struggling with the old concepts of, you know, let’s build the intergalactic data warehouse.

You know, you remember, let’s just, if we just throw it all to a big ass database, it’s gonna be amazing.

You worked for Teradata. That was the business model.

It was a great business model.

Yes.

But it’s tough. It’s tough because why is it tough? You first, you got to find out where all this information’s at.

You’re going to bring it to one place. You actually have to define it. You don’t even know what most of this, where this information resides at today. And typically what do most companies do? They’re actually replicating the same information over and over again versus driving to get yield out of it. And then you have probably eighty percent of the data underneath the iceberg, if you will, is still unknown to them and unknown context. And is it, is it valuable?

And, you know, I, I sort of feel this is going to extrapolate to a much bigger problem, especially in this world today, right? In the last, what, one hundred and twenty days with the genetic harnesses and everyone’s talking about agents and AI.

Well, what do you need? You need great data. You need great context. You need great governance. And we’re gonna have to we’re gonna have to get to that pretty darn quickly.

Yeah.

Some would argue that that should have happened a few years ago, but it’s interesting. I I agree. I think the number of sources is probably underreported. And I think you’re probably right where if you ask it’s kind of like the the forest you see in front of you versus the forest you don’t see or the world that you don’t see.

Right? I suspect, you know, Gartner’s correct. Eighty to ninety percent of the data in most companies is unstructured and completely outside the scope of most governance programs. So I’d be willing to bet that if you ask the data leader how many sources of data, that’s basically the same thing as asking how many ETL processes are you running, how many pipelines are you running.

Oh, I’ve got four hundred of those. But then how many other sources out there are you just completely off the radar? That makes sense to me.

That is huge opportunity, also risk.

No. It’s look. It’s a huge risk. I’ll I’ll tell you. It’s actually funny.

It it love that you said unstructured data. It was something I’ve been thinking a lot about. I mean, here at Nexus One, with all the agentic harnesses out there, I I challenged our team to build a NexSys One brain. So to take our product, but basically, I wanna pull all information in the of the company into one place.

So that’s not only what we have in our core data systems, like our finance systems and HR systems and CRM systems, but let’s pull all the slack data. And then of course for myself, I said, I want a, I have a chief, I have a human chief of staff and he knows, I, I tell him this all the time, but want an IngenTech chief of staff now. So so now what does that mean? It has to listen to all my conversations and we have to pull all that data and context as well.

And you start to play this out for, I mean, we’re, we’re, we’re a smaller enterprise, but you start playing this out for a larger enterprise. Can you imagine the, how hard that is? And that’s not just pulling the unstructured data, it was the governance, it’s the security of it, you know, as well. And so I think this is a, it’s gonna become a really fun problem for all of us to go start start to unpack together.

You talked you mentioned something earlier that we just take for granted.

And and any time we take something for granted, I think that’s an opportunity to revisit from a first principles perspective on, should we really be doing that? And that’s the issue of replication. Right? Like, you you just you you just mentioned it, but we do this all the time.

Like, we’re we’re we’re we we replicate data out of source systems. We’ll we’ll replicate we’ll drop table replicate tables all the time because storage has become reasonably cheap. I mean, cheaper than it was ten years ago, most certainly, and easier than it was ten years ago with lakehouses. I mean, like, I don’t have to do anything.

I just dump it all into forever.

But the duplication thing, when I think about it from a first principles perspective, do we really need to keep doing that? Is that something that you you and your team are thinking about?

Yeah. It’s something we spend a lot of time on. Actually, was one of the reasons we started Nexus One and, you know, there was a, there was a metric that came to, that came to me in my prior role when I was CEO of Think Big. And we get asked all of our clients, we had about eighteen hundred clients at any one time around the world. And we asked them a very simple question.

Out of the project you’re doing today, right? How much of this data of this new project is a replica of the data you used in your last set of projects. And what we got, what we found in this quick study of data of metrics of these clients was ninety percent of the data of a net new project was a replica of the data they used in the last five projects.

So it was about replicating the redundancy of data, not actually driving to yield, information yield.

And so we think a lot about, we shouldn’t be spending the amount of time building ETL or ELT jobs and data transformation all over the place without common yield to that information set. And so we, we, we have a very strong belief in virtualization technologies. We rely heavily on, on Trino. There’s a lot of updates we do to it, but really let the data live where, where it does and go to that data and pull it where I need to bring the data engine to the data versus bringing the data to the engine.

And we’ve seen by doing that just completely, it changes the equation of the cost equation of what it takes to manage your data. But secondarily, the time it takes you to deploy net new capabilities in your enterprise goes from months and years or months and quarters down to days and, you know, literally no more than two weeks to deploy new capabilities. And it’s something that I think, you know, as a CDO and with all the pressure we get from the businesses today, the ability to deploy faster and governed really matters. And we see data virtualization as one of the fastest way to to get to that point.

So I I in my past, when I’ve had when I’ve gone deep in these discussions with with people, I I’ve asked questions like, okay. Do we need to really start replicating data? Do we even need a data warehouse? A lot of these and that’s slightly rhetorical, but I would like your opinion on it.

All those conversations inevitably end up at this place where it’s like fast read versus fast write and the physics of a hard drive and and you and we we will always need some idea of a data warehouse because we need to be able to aggregate a lot of data at scale and do fast reads. Then there’s OLAP, OLTP. And I get all of that stuff. But in my experience, anytime the technology is the only barrier, like, the technology is the only barrier, that to me is like, oh, okay.

It’s not gonna stay this way. It can’t always stay this way. So I’m not trying to get all philosophical about, you know, analytics infrastructure versus operational infrastructure, but you seem to be suggesting that these things could potentially have the kind of kind of mudge into one thing.

Yeah. I I love the provocative question here, which is does the world change and is virtualization the only way to go? Look, my view of today, the view of today versus the view of tomorrow is today I would start with virtualization and I would harden those things that need to be hardened because of business need or business SLA, when and how needed. The reality is, you know, we’re moving to a world where more will be virtualized versus less, but that’s going to create a world in where fine grain access controls to the data is important. Governance is important.

Lineage is important, but then on top of that, it’s the context, right? Because what really matters is the way we interact with data is going to become very different. We’re seeing it. We’ve all had our chat GPT moment or a Claude moment.

Maybe some of you guys have had your open Claude moment at this point, or Hermes moment at this point. But the reality is we’re going to be interacting with our data in many different ways. And the confidence is going to matter there more than the structure. So it’s going to be governance.

This could be metadata. It’s going to be security that matters where we store it and how we access it will change. And I think technology is, is, is moving so rapidly. So it’s compute to a world where virtualization will become more of the answer than, than less of the answer.

Right. And I’d say everyone’s going give you edge cases of where it makes us persist data, not persist data. To me, that’s de minimis to the fact of we need to make the information highly usable and governed and do that at scale.

Well, I don’t know if this was intentional, but you used the word information, which a nerd like me would say, okay.

That’s slightly different than data because it’s richer than data. Information is data with with context in it. And if if that was a conscious decision, I’m I’m all in.

I’m trying to By way, it absolutely was a conscious decision and you probably heard me use the word information yield.

Yes. And I started using that word more and more with I believe enterprises and CDOs in the aggregate are going to have to, at some point report on information yield, which is what is the value of the data and how am I extracting information out of it? So if we think of CDOs as refineries, it’s, I have all this raw data in my company. How am I enabling the organization to extract value?

And can I put a metric to that value that I’m extracting? And I I’d given this a lot of thought there was years ago, Walmart used to report on information yield in their earnings report. They used to talk about this. And then it kind of went away.

And then I I literally just heard a podcast with Arvind, the CEO of IBM. And and he said, you know, companies today are going to be talking about data as a financial asset in the company and trying to quantify what that information, the information they can extract from their data and how important is that to their competitive advantage in boats in the future? And so I think one of those things I’m trying to bring back is how do we how do we all start talking about information yield, and how are we extracting that value?

So yeah. So so short answer is yes. It was absolutely very intentional.

I I love the optimism. I I really love the optimism, but but I’ll tell you, this is a this is a big win. Meaning, quantifying the business benefits of the stuff that we do is just most of us don’t. Right?

And this was this was my biggest frustration as a Gartner analyst for three years talking to CIOs and CDOs all day every day. We had data that we like, actual data that showed, okay. When you quantify what you’re calling information yield, I could call it business outcome, business benefits. When you put a number on it, right?

When you do that, you prolong investments, you increase investments, you prolong your tenure, Your customers are happier. They good things happen. Release the doves. The angels are gonna sing.

Well, we did. Like, nobody did it. So so what do you think will be the driving force here of of of CDOs tracking information yield or maybe CFOs or whoever, I don’t care, somebody tracking the the information yield? What is going to be the lightning rod moment that will that will make companies start doing that?

I think we’re in it today.

We’re in it now. We’re in the AI moment. The AI moment is going to force us all to treat data as a strategic asset in the business. And let’s just, let’s just play it out.

Right? Let me, let me kind of play out why I think that is the case. Let’s start with just the basic standpoint of openness. We’re already starting to see companies talk about openness of my data and how I exchange data between my applications, whether you were Slack, who closed down their ecosystem and started a whole set of conversations.

Now you saw Salesforce go to headless. Right. But the reality is how do I have open exchange of my information across my apps today? And those who are not open, I believe we’ll have a harder time.

So those who are trying to build closed ecosystems versus open ecosystems of the corporate data. The second is sovereignty, right? I can tell you with many conversations I have, and I’ll break it up between both regulated industries and non regulated, every every organization that’s regulated today, and I’ve had a conversation with whether it’s banking, whether it’s telco, whether it’s healthcare, government, are having the conversation around sovereignty of my data, which is, do I want all my data even in the cloud anymore? Do I want it in a, someone who’s taking it and enriching it?

Do I want it going to a third party LLM provider or do I want to host my data back on prem or in my own virtual private cloud where I have total ownership of what’s going on in it and the derivative products of it, the ontology of my data. We are seeing that moment today. Then the third is we were thinking about AI agents and tokens, right? And in order to have good AI agents, you need to have good data, good data infrastructure, good security, good context, but the token layer, if you will, and companies will start to build their own ways of hosting their own LLMs or SLMs, you know, things I call compound probabilistic and deterministic AI internally, but that cost equation is going to force the question of where do I deploy agents or what data do I deploy the agents?

How do I deploy it? Is it deterministic or is it probabilistic right now? We’re making everything to a probabilistic model. We’ll see the resurgence of ML.

I believe this is it’s a lower cost way of getting to answers.

And we’re in that moment today, Uber’s CTO just came out, with two weeks ago and said, ran out of my entire token budget for the year in Q1. And so AI slop will drive toward us getting a lot smarter about data and context and how and what data is important to drive what outcomes. I fully see that coming. I think it’ll be one of the biggest conversations we’ll have between now and end of this year. I don’t think this is a two year prediction. I think this is an in year twenty twenty six prediction.

Oh, bold. Okay. That’s bold. So half of what you said, I’m not there. The other half, I’m completely there.

So the first half basically was governance in essence. Right? And that understanding provenance, understanding lineage, understanding risk, data sharing.

Right? We talked about open standards. I’m all for, by the way. Open standards in in data.

We’ll never get anybody companies to agree on any sort of, like, metadata standards. My my goodness. We’ve been talking about metadata standards as long as we’ve had metadata. Going back to, like, semantic web, for heaven’s sakes, we’re not gonna get I don’t think we’re ever gonna get there.

I think the AI will actually help us act as this translation translation layer. You call it account. I call it customer. Somebody else calls it prospect.

Who cares? I think the AI can figure that out.

The second thing that you talked about, though, I heard over and over and over again at Gartner. So the first thing that I heard at Gartner Data and Analytics Summit over and over and over again was context, context, context. If I had a dollar for every time I heard context, I don’t know. I’d buy a nice car. But the second most the thing that I heard the most was and I hate this phrase. It I think it’s kind of pithy and and and low thumb, but what Gartner calls fin ops.

Financial operations. Basically, what you’re talking about, be smart about how you consume tokens. Right? And and don’t consume thousands of dollars in tokens on a project that isn’t cost justified.

Right? Or don’t go consume thousands and thousands of tokens in this in trying to cram so many so much text into a context window because you’re trying to drive deterministic behaviors out of a probabilistic system. Yes. Right?

Which which is what you basically you touched on with the old deterministic and ML ML returning. Totally agree with that.

Because what what I see now is, like, you can make LLMs act very deterministically, but how much context how much do you actually stuff in the context window in a given one prompt, right, to to to get that behavior that you want when maybe it’s just an ML problem? So financial operations and maturity around making sure that you’re not spending us into oblivion, that one, that that could make CDOs figure it out.

The yeah. It’s gonna come, but the problem is everyone not everyone, but a majority of folks today want to use LLOs to solve everything. And they’re they’re a purpose built tool.

And so you could spend, and I just saw this with a client of ours, where we’re doing real time AI, a real time AI solution for, for intent information, where we’re trying to use an LLM to do reasoning on categories. And it made much more sense to be done in ML with some basic reasoning, follow-up with the LLM. So they’re going to run-in parallel, but the prior model, they could have banked up with their company.

I mean, I mean, I’m sure our friends at clon or, open AI would love that. But the reality is the CDO is going to become the governance layer very quickly for this because everyone’s going to point to him saying, I don’t have the context. I don’t have the data. I don’t have this.

I don’t have that. I don’t have the tooling. And then we’re going to have to bring that tooling to the masses and make it really, really easy to use. And, you know, that’s something that, you know, myself as a CEO of NexSysOne, given a lot of thought to it is how we build out our own products to enable this, especially for sovereign estates.

But it’s, it’s an area that I, I believe it’s going to be it’s coming it’s coming with a vengeance. Inevitably conversations.

It’s inevitably coming. This reminds me so much at the beginning of the explosion of AWS.

Right, and and and cloud and cloud compute in in general. Right? So AWS comes on the scene. And at the time, I was running an IT shop for a two billion dollar company.

And we had an official AWS account with AWS, but then there was all of these shadow accounts across the entire organization where it was engineers on their credit cards spinning up AWS instances, and they were all over the organization. They were getting expensed.

I went to I went to I went to our finance group and said, can you can you give me a summary of of all expense payments that that were to AWS? And it was it was hundreds of thousands of dollars being expensed every month that I didn’t know about. And I’m I’m I’m the the guy responsible for cloud infrastructure, and all this money was going out the door that I didn’t know about, which led to the birth of these get these these management planes where it’s, okay, centralized management of the spend, turning up, turning down. It’s the exact same thing. So completely agree with you.

But but but just just, you know, I think this, and I’m going to cut you off for one second. No, no, go. Leads to the problem of, let’s go back to the AWS problem, which was I, what did most companies do? They didn’t modernize their systems.

They encapsulated it. They it’s like, it’s like encapsulating nuclear waste. And I moved it to the cloud. We didn’t modernize it for a cloud. We didn’t modernize it for containerization, Kubernetes, or any of the things that are happening.

Now the AI moment has come.

And let me use a different analogy here. I had the privilege of working with a restructuring firm years ago, and I was doing consulting, SI work. And the restructuring, right, slaps his hand on the table. He goes, Anu, I figured out what you guys call a transformation, right? Versus restructuring goes, it takes you guys five times as long and twenty times as much money to get, do a transformation than it does us to do a restructuring.

And I believe we’re in the world of AI restructuring. And what does that mean?

That means our data states that are sprawling sometimes seventy year old data states, from mainframe all the way through all these different technologies are going to have to be modernized in a way to allow for this new AI layer.

And so it’s like the old manor house that’s been built over a couple of centuries, and we’re gonna have to get ready for the new guest in the house, which is AI. And that means we’re gonna have to modernize. And so the other trick that’s coming, and it’s something, you know, we spend a lot of time in Nexus One and others do too. It’s how do you get to one touch or zero touch migration and modernization of this old data state to a new modern estate where we have the tooling enabled to do the FinOps, but also to manage the AI and manage the tokens, manage the context, manage the fine grain access control that these capabilities are going to need, because it’s no longer just querying reporting.

So I I couldn’t agree more.

The the guest is coming. Guest is already here. I think the guest is living in the basement.

And and we’re and we’re kinda and we’re we’re kinda doing our best to make sure that that she or he gets fed every now and then. And and, you know, that there’s a natural source of light maybe trickling in from a tiny little window, but I don’t I don’t think that that we’re doing the most that we can. But I I I think you’re right on this process reengineering thing. And I and I think it is exactly why Salesforce bought Informatica. It’s exactly why SAP bought Raltio, why ServiceNow acquired data dot world and others.

It it is that we need to the the greatest value is gonna come from process reengineering. Right? It’s going to come everybody knows this. Right? You basically just said that, in essence, a pair of. But the greatest value is not gonna come from chatbots that could that can re regurgitate your customer service FAQs. You’re gonna come from rearchitecting these legacy processes that were defined, you know, under Deming standards four years ago, TQM stuff from the way, way, way back in the day that really hasn’t been changed or challenged.

And the only way we’re gonna do that is with this data state. Right? And if the data is is is no bueno, we we got a problem.

And it’s really tough. Right? Because you now you gotta have one control plane across your entire data state. And I’m talking my book a little bit, but my my my view of the world is why we started that soon was initially you can have one, one plane across your entire data state, whether it’s on prem, whether it’s in cloud, whether it’s in virtual private cloud, your data is heterogeneous, right?

It’s across multiple on prem Excel, all, all of these, all of this stuff. So at a minimum, you need one pane of glass to manage it all. And then you need to understand how to modernize those systems, encapsulate them, modernize them, and bring these capabilities, you know, to bear because, know, AI is going to want to go through every part of your data state. It’s going to want to go everywhere.

And it needs the ability. You, you, you hear the word context.

Context is great, but we still got to not only give context, security governance to all data, and that means all of our data also has to have a digital birth certificate.

Where did it come How did it get, how did it get here? What does it mean?

Right? Otherwise, we’re going to get bad answers from our data and AI is really key. That’s the way giving really bad answers sometimes. And the best way I have to describe this is if it go back in the time machine to know Malcolm, you and are a little older. We can go back twenty five years ago. And when we get the birth of these BI web BI tools, what was the first thing that happened when it went to the masses?

Everyone could do a Cartesian join and a BI tool and they had an answer, but without the context, the answer doesn’t matter. Right. And same thing’s going to happen here if we don’t start to get control of these data states, but at a much faster scale.

Yeah. That for sure. So I agree on the single control plane. However, there is a large sect of us who who are really focused on decentralization and see the idea of a single control plane as inefficient.

I don’t agree. So I don’t agree. What’s your pushback to that?

I’m going push back. And I think that there’s two parts to that. There’s a single, there’s an and and an or, right? So my point of view is it’s and not or. So most, most view, most folks in the world, in the state of world today say, life would be awesome if all your data resided in my control plane, which also meant all of your data must be in my data system. They’re much different. My format, my system, I control, I govern all your data.

So if you’re a bricks, you’re a snowflake, I’ll make fun of them. There’s two big guys on the block, but what do they say? You put all my data in my estate and you’re awesome. Except if I have data in on prem system, I have data in another, another, ERP, what am I doing? I’m replicating the data back of that data state to be in that control plane. No, no, no. What I’m saying is the control plane is across your entire data state where the data resides at.

And so we have to be very thoughtful about and not or.

And so the the Yeah.

I don’t know how you avoid that.

Right? I don’t know how you avoid that. I’ve wrestled with this for thirty years. Right?

And the this this pendulum swing back and forth centralization, decentralization. Centralization, decentralization, we need a data mesh and all all centralization is bad. No. It’s not all it’s not all bad.

And if you see, oh, how many customers do we have? We better have one answer quick.

But you, but you have both, but that doesn’t take away from having one control plane, right? You’re going to have some data that’s centralized and is a golden record. You’re going to have some data that’s decentralized, which doesn’t have a lot of yield value to being centralized. That’s okay as well. But you need to have context, governance, security, the digital birth certificate of all that data, at least I hate to use this word orchestrated itself. I can eat like a data movement word, but managed to some degree.

And, and we see tremendous amount of value from that today.

Yes. I would argue it’s necessary. Right? Putting on my my analyst hat, if I was that sitting at that whiteboard behind you, I I would draw a three ring Venn diagram.

Right? And and sometimes the data is on the outside ring and nobody cares, and it’s sitting at Salesforce, and it’s one of the four thousand custom fields you’ve built in Salesforce. And it only lives in Salesforce, and it never goes anywhere else. Nobody cares. Knock yourself out. He said he’s like, I got it.

Yeah. It’s a z table in SAP. Lot of folks have them. Great. Go forth and conquer.

Speaking of s a speaking of big companies that are trying to force you into their their standards. Anyhoo Yes.

But then the middle of that Venn diagram, you know, most most of our high value processes, the the workflows that we were talking about, like, reautomating these processes, like quote to cash, procure to pay, data is shared across them. It’s widely shared across them. So the idea that you could have complete decentralization and who cares, you know, everybody everybody wins. Right?

And everybody can have their own definition of customer. Well, that may be fine at an operating level. But again, if your CFO asks, How many customers do we have? Or if you’re trying to put it on your annual report that gets reported to Wall Street, you better be confident that you’ve got the one right answer to that.

So that is your or or your and. Right? It’s the yeah. That and. Yeah.

That’s yes. And the decentralized. You you you you’re gonna have both. I don’t see that the world where it’s a hundred percent decentralized. And don’t see that the world was a hundred percent centralized. We’ve tried the hundred percent centralized world that that just led to lots of lots of money for SIs.

Yeah. Yeah. You know? Yeah.

Exactly. But you’re gonna, you’re gonna have both, but you need to but our our opportunity is to manage all of that data, both where it lives and and and where we give it a golden set of truth for the organization.

Now now you’re talking my language. You’re you’re talk you’re talking MDM. So let’s in our last couple of minutes, you’ve used the phrase, digital birth certificate twice now, and I’m I’m intrigued. I’m I’m intrigued.

And because I’m I’m gonna I’m gonna show my nerdiness here. And I talked about this maybe, like, in episode two or three of my podcast, and I haven’t talked about it much because, frankly, it’s not as popular as it used to be. And I love the idea of of some form of blockchain here. Right?

Some form of immutable ledger that forever captures your digital births of digital business.

And maybe it’s an NFT. I I I’m fine with that too, but but something on a chain that that is tightly tightly tightly governed. He said, what do what do you what’s your response to to my ranking just now? What do what do what do think? Is that a viable approach to solve for your digital birth certificate?

You know, you must have seen my Twitter account. You must have seen my Twitter account at some point. I’m a big blockchain nerd as well. No.

So look. Yes. I I we actually spent a lot of time early on in the history of nexus doing blockchain work as well. And we thought about this control plane idea and everything being a blockchain itself, along with the digital birth certificate, we connect to many, actually we connect to many, many blockchains.

We do a lot of work, even in the Sui ecosystem, and I can go on and on, but the reality is I have a very strong belief of the same, right? That your digital birth certificate will be immutable for that piece of data. And it needs to be stored in an immutable fashion, right? Especially with the generative technologies that are out there and whether you call it NFT or whatever we want to, whatever we want to call it, there’s an immutable record of the birth of that data and then how that data has been migrated, transposed across the enterprise.

And it and it frankly, it’s gonna have to be there, I think, over time to to have to have a level of truth in what we report on and what we decision upon.

Well, it’s interesting. I listen to people in the data world describe requirements. Right? And there’s a for for something that looks like this digital birth certificate, if you wanna call it that. But let’s call it let’s call it maybe a data contract.

Right?

Maybe, or some idea of a proven stamped, validated, governed Data product, data contract.

Artifact of fact, right? This article of fact. And data people will will use words like data contracts, automated governance to describe things that have existed in blockchain for a long time. Smart contracts have existed a long time.

NFK is this immutable trustworthy record of authenticity has existed for a long time. So people in the data world use use similar world words to describe requirements that I would argue are well are well we’re well supported by this technology, but I I don’t know. Anyway, Anu, thank you so much for joining the podcast today. We have to end slightly abruptly today because we’ve had a technical problem.

If you wanna learn more, go to Nexus One. Go to their website. Learn what they’re doing. I think they’re doing some very, very cool stuff around this single plane, which I think is inevitable.

I think we’re going in that direction. And he was publishing in Fortune Magazine, which is incredible. And he’s prolific on LinkedIn like me. Go check him out on LinkedIn.

But for now, that’s a wrap on episode a hundred and two. If you’ve made it this far, please take a moment to subscribe, to like, do all the things the socials prefer. Anew, although I can’t hear you, thank you for attending. Thank you for sharing your wisdom.

Look forward to talking to you again another time soon. And with that, I will sign off.

See you on another episode. I’m losing my voice. My goodness. It’s just everything’s going to the heck here. We’ll see you on another version episode of the CDO Matters podcast. I’ll very soon. Bye, Ward.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.

The CDO Matters Podcast Episode 102

The Data Sprawl Dilemma: Centralize, Replicate, or Virtualize? with Anu Jain

Anu Jain

Episode Overview:

Episode Links & Resources:

ABOUT THE SHOW

Malcolm Hawker

Megan Gregory

LET'S DO THIS!

REGISTER BELOW

The CDO Matters Podcast Episode 102

The Data Sprawl Dilemma: Centralize, Replicate, or Virtualize? with Anu Jain

Anu Jain

Share

Episode Overview:

Episode Links & Resources:

ABOUT THE SHOW

Malcolm Hawker

Megan Gregory

LET'S DO THIS!

REGISTER BELOW