CDO MATTERS WITH MALCOLM HAWKER

CDO Matters Ep. 44 | Data Management at Scale

February 22, 2024

Episode Overview:

On this episode of the CDO Matters Podcast, Piethein Strengholt, author of the O’Reilly book ‘Data Management At Scale’, and current CDO of Microsoft Netherlands, shares his perspectives on a variety of emerging technology trends in the world of data, including the data fabric, data products, MDM, AI – and many others – even blockchain for data management.

Malcolm moderates a lively discussion on these trends, with a focus on how data leaders can better understand these concepts, and their value and purpose in a modern enterprise data architecture. 

Episode Links & Resources:

Good morning. Good afternoon. Good evening, everybody. Welcome to the CDO Matters podcast. I am your host, Malcolm Hawker.

I am joined today by Piethein Strengholt, who is the CDO of Microsoft for Netherlands. I’m thrilled, Piethein, that you that you could be here today. You’ve written an amazing book. It was published when did you publish your book?

About a year and a half ago, maybe?

Well, the first edition, I think, two and a half years ago already, and the the refresh or the second edition came out, I think.

That’s that’s the one I have. Here. Yes. It’s that.

Yeah. That’s the second edition. And I think what motivated me for writing that second edition is truly data mesh.

So that I think maybe in the the preface you Well, we’re gonna we’re gonna talk about that a a little bit more.

I’m glad you I’m glad you read the book. I have read it.

It is a very dense read. There is everything that a CDO or VP of analytics or even a senior manager of analytics would want to know about data management, particularly data management at scale.

I’m really honored that you would take the time to speak with us today. I think we’re gonna have a great conversation. We’re gonna talk a little bit about data products.

We’re gonna talk a little bit about a little bit about AI. I think you kinda have to when you’re talking to the CEO of somebody who works for Microsoft, given their presence with OpenAI. We’ll we’ll talk about the data fabric. We’ll talk about maybe even a little bit of MDM, my favorite topic. But, but with that, Bishan, maybe you could you could just say hi to the audience, give a little bit of a a history of of of your background. And and what led you to to really focus on on on on on writing a book given your your current career?

Yeah.

Yeah. I started my career a long time ago, first in consultancy. So I’ve worked for two major consultancy firms. Not so much in data during those days.

So mostly in application integration, system integration. So that’s where, my background comes from. And I think at some point, I did my first data warehousing project with BI, building data marts, also tons of SQL code I wrote. And I think then data really got me.

I think it’s, yeah, very nice subject. It it includes a lot.

And then at some point, I was approached by ABN AMRO, why don’t you join the data management team? Because we, we plan to get rid of this large monolith. We have, the enterprise data warehouse, and see what you think works best. So I worked closely with that company, and we transitioned from that more monolithic architecture in what you would call today data mesh.

Three hundred teams at that time. And then at some point, I was approached by Microsoft, not to join that company. And in between, so while working at ABN, I really gave a lot of presentations. So on the journey we were at, and lots of people ask me, well, where yeah, does all that knowledge come from, and why don’t you materialize it?

So at some point, yes, I reached out to O’Reilly as well. Why don’t we work together, write a book?

So that’s that’s how we started. And the first book was more, I think, written from the perspective of ABN AMRO working there. And after joining Microsoft, I learned, well, there are, I think, many good architectures. There’s no right or wrong. It’s not not just about data mesh or data fabric.

It will likely be a combination of best practices and patterns combined, and and look at your own business requirements in that respect. So that motivated me for writing that second edition.

Well, what strikes me about the book is I mean, I’ve been doing data a long time. This is this is really this is real gray hair.

And and I and I consider myself fairly well versed. And and when reading your book, I was just I was humbled by the depth of knowledge across all of the subject matter areas that that you have. Like, I I can go deep on MDM. I can go deep on data products. But when it comes things to, like, you know, architectures, integration patterns, like, I I was just struck by how deep you you you were able to go on a lot of these topics. So, again, if if you are maybe a little less technical, maybe you’re a CEO that has come from the business side of the house and you want to learn some of the words, some of the some of the lexicon, some of the technologies, some of the architectures, it data management of scale is a is a great place to start.

I would absolutely be doing it in a quiet room.

Yes.

Well, it’s it’s funny you say this because the initial version I wrote, I started way more high level introducing concepts.

So what is an operational data store, and why do you do you have to have one? And what is an enterprise data warehouse, and how does it contrast from an operational data store for for instance. But then, yeah, handed out to the first reviewers. And all day, they said, well, yeah, this is just basic stuff you can easily find out there on the Internet.

Ouch.

We like way more the the depth and, so I took out the first four or five chapters even and condensed that into a really short summary, setting the scene and and then immediately jumping straight in the second chapter all the way into the depth of domain driven design, data domains, etcetera.

So yeah. I think yeah. You’re right. But, yeah.

This is what the This is what the market wanted. At least it’s what the public wanted.

But maybe I should do another, maybe, book or more an introductionary level for if you write the CEO or CIO, what is data truly about?

It’s it’s interesting. What I what I’ve seen over the last year, well, maybe the last two years. And and this is true both in in Europe and and in North America.

I had convinced myself that our our industry would had reached kind of like maturity.

Right? Where where that if there’s maybe five or ten percent new people coming in every year, that you that means that five or ten percent left. Right? And that we’re just kind of at this steady state and we’re just kinda chugging along.

I don’t know why I had convinced myself of that just because I maybe I assumed that every company has to do analytics. Right? Every company has to do data. You can’t not do data these days.

So I just had convinced myself the things that kind of reached a steady state. And what I’ve seen over the last two years is that I’m I was wrong. I mean, there are a lot of companies that are are kind of new to certainly more advanced forms of analytics, date data science and all that stuff. Certainly that.

But even in worlds like MDM, I’m seeing there’s companies that that are they’re approaching me or approaching us. They’re reaching out to me on LinkedIn and saying, hey. I’m we’re looking at MDM for the first time, which is like, wow. So does that that’s one thing.

And another thing I’ve seen is that a lot of companies are are hiring CDOs. Like, just a lot, and a lot of them are pretty new to data. So I think you could add a ton of value by by, you know, maybe circling back on on some of the more basic stuff. But are are you seeing the same thing in the market in in in Europe?

Yes. Absolutely. Yeah. Yeah. I think also now with GenAI, generative AI, we see so much attention.

And Oh, it’s not But again, it boils back and down to, making the data right fit for purpose first. I think that’s what people I think, yeah, for sure, you can do, you can generate quick value by looking at individual use case. But in order to do that at skill, massive skill, I mean, well, yeah, you need to think way beyond a single use case. So you need to have a strategy and a proper governance and and metadata management.

So all of these disciplines from from DAMA, the, yeah, the DM wheel, I think you probably know about.

Or we need to have that base. Yes. Yeah. Well, the the the wheel is is both wonderful and both scary.

It it’s it’s wonderful because it provides a framework. I mean, it it says, you know, here are the things that you should be focused on from a data governance perspective, and that’s great. But it’s scary because there’s a lot there.

There there there’s a lot.

Yeah. There’s there’s a lot there. Question, before we move into data products, did did did ABM and Rogue, did they get rid of their data warehouse? Was that successful? Did that actually happen?

You you said you answered it on on how you define a data warehouse. I think they at the end, they got rid of their enterprise data warehouses. So the approach of integrating first, combining integrating all data upfront before you were even able to use it. I think, yes, they still do data warehousing a lot.

And I think, that’s relevant also given as a regulatory, financially regulated institution, you need to report out on, lots of financial dimensions, and these should all be the same for risk management, finance, ALM, compliancy, the annual report you need to, create.

And that ties back to massive consolidation, integration, harmonization of data. And I think, yeah, that’s that’s where our data warehouse is for.

I think you could argue maybe and debate on the concept of a data warehouse, and is it similar like we have done enterprise data warehousing in the past.

There you see, much differences. But it’s it’s mainly the skill. So, yes, they still do data warehousing, but more in isolation, more closely better aligned, I would say, to use cases instead of one giant monolithic architecture.

So that’s their implementation.

That’s their implementation. Mesh.

It’s it’s what what it’s what what it mostly makes a mesh, I think, there is is more the federation democratization of responsibilities.

And this is also, I think, the common theme when I had talked to customers. How can we scale by democratizing activities, responsibilities throughout the organization? I think to me, this is what makes data mesh so great. So in the past, yes, we all always had that strong belief. There was one team looking over all data, all data quality, using an enterprise canonical data model, and they integrated, consolidated all the data for the entire organization.

And I think that’s the the big shift that we see these days. We would like to federate and change that model.

Yes. And then we have additional concepts, like data domains, data products, and the like.

But I think that’s less of a team these days among customers. It’s way more the the organizational aspects, building a proper community, the role of data governance.

Is it a central architecture or, an architecture you instantiate multiple times for different business departments? So so those kind of discussions I mostly have. I mean, is that data mesh? Yeah. Some parts. Yes.

I have a I have a confession. I I think I I get way too hung up on semantics.

Yeah. Which maybe that means I’m good for MDM. Right? Maybe that means I’m in I’m in I’m in the right industry, but I get I get really hung up on semantics.

And when when when I when I read something and I say, okay. Well, this is how you define a mesh. Right? So of of course, I’m deferring to to Zama in in how she she she defines a mesh.

And I know that you that you that you call that out in your book as well. But I but I read kind of, you know, here are the four pillars and here are kind of some of the operating principles of a mesh. And then and then I read that, and then I just kind of, like, like, that’s it. It’s what it is.

And anything outside that means it’s not a mesh.

And I and I and I do battle with that because I’m also a pragmatist. Right? And I and I also know, you know, you gotta start simple and things evolve over time. So so I I understand that the mesh has evolved to something else, then I think that that what Zomak had had originally outlined, where at the very least, it is starting as something else, more of them, before you said the organizational patterns that you just mentioned. But I I do I do struggle sometimes a little bit when people say that they still kind of embrace some of the things that Zhamak said the mesh is designed to break, like centralized data management patterns.

Right? That’s kind of like, if you’re still embracing centralization, I I I I struggle a little bit with understanding how that could still be meshed, but maybe I just get hung up on words.

It’s way more nuanced than that. Yeah. And I think it depends. So in in my view, it’s it’s all about achieving scalability for your organization, and, each organization is different. And what scalability means to you, I think that will differ between organizations.

So that’s It it should.

I I would hope so. Right? What’s scalable to a an extremely large manufacturing company may not necessarily be sky scalable to small software company.

So Yes. Yeah. We’re starting before. Yes. Yeah. Exactly. Alright. That’s it. Well, that’s a good good segue talking about semantics and definitions, and and the mesh.

Let let let’s talk about data products. I would love to hear kind of your perspective on on a definition if if you’re able to do that. Well, I know you can’t because you did in your book.

A lot of people do struggle with the definition there. I think I have one, but I’d I’d love to hear your perspective. How how do you define a data product and what makes them unique?

I’d I would rather define it as a logical entity you define somewhere either in a catalog or in in a data marketplace. And from there, you draw relationships to where the actual physical data is managed.

And, yes, data products should come with a bunch of principles attached to it. So so it should be following interoperability patterns or standards, for instance. There should be metadata supplied with the data product. It should be well described. It should have high quality. There should be an owner attached to it.

But the way I would see it, it’s it’s brought a more business view than a technology view, and that’s where I think I differ from Samak’s view where she approaches it more, I would say, as a microservice or in logical, tiny architecture in which infrastructure code, metadata, data, all is combined, encapsulated, and hosted somewhere. I think that definition, yeah, to me, honestly, comes with several problems, but it’s also very hard to grasp for lots of organizations. So they they very much question. So how would that then, would work in practice and how to solution that then exactly. And, there are honestly also not really many practical examples are provided in the book and also elsewhere on the Internet. So it stays a bit too abstract in my view.

Well, so so getting back to your perspective of of a data product, not necessarily the the mesh. I mean, one of the things that you you’re clear on is, I’m never gonna I’m never gonna say this right. Atomicity atomicity. The small the smallest logical blob of something. So data products kinda follow that paradigm. You can’t make them any smaller. Is is that kinda core to one of your definitions to your definition as well?

It it you you develop data products for your end consumers.

So it’s about scalability, using data, skill, reading data many, many times for many different purposes.

What is the size and granularity? I think, yeah, that depends and differs and comes with lots of the ones. Maybe you end up with multiple data products.

Some are smaller and have are a bit more fine grained. Others are maybe a bit larger in that respect, and they cater for different scenarios. It it depends. So it’s the same, like, I had my background in API development. And I think then then I think back in the times of service orientation, we saw basically the same thing. So we started with this massive monolith, the enterprise service, and also lots of composite services being developed within and, one big team managing that whole infrastructure. And at some point, well, we decided, well, in order to achieve scalability, we federate the responsibilities of building and developing APIs back to the application software teams.

And there, we also have principles. So an API is meant for consumers, had to follow, I think, also several principles. So it should be decoupled, meaningful.

You use resource orientation to group organically what belongs together on the level of an API here, for instance. It should be owned, should come with metadata security and the like. And I think a data product in in that way is no different.

Well, one of the things that that I I do I did appreciate in in the book is is kind of how much you stress the data product being a reflection of a business capability.

And and that’s something that I really appreciated because it helped me kind of better understand, that that this is kind of the shift left. I’m I’m air quoting the shift left left that Zmock is talking about. Right? And getting getting the data closer to the business applications or the biz or in your case, the the business capability. So that that part, I I find very helpful, and that that representation of the data product should be as close to what is being represented in the capability as possible. You agree?

And I think there she’s spot on. So aligning product ownership with data ownership totally makes sense.

Having, I think, flexible, autonomous teams looking over the application, taking ownership over their data, and and these are multidisciplinary teams in my view and and and should come with, DevOps and DataOps kind of practices. Great. Also, fully makes sense in order to achieve scalability. Because if all the work resides within one single team, yeah, I think that’s not a scalable model. So I think that shift is is fully right.

So one one metaphor that I found useful for me while I was reading your book, particularly on on the section about data products, and and I love your perspective on this, if if you agree.

Mhmm.

Is that I started to be started to think about this through, an an analog of a, of a supply chain. Right? Like, a physical supply chain and that I’m making I’m making something. Right? I could be making a phone. I could be making cornflakes. I could be making anything.

A metaphor that I started to use when I was reading your book was that was that you that data products could be seen almost like a raw material.

What how do you how do you and and that that raw material would then be subsequently used downstream, maybe in a data science function, maybe in some sort of BI tool, maybe in back into another business application potentially. Who knows?

But but the the data product is a raw material. The raw material goes and becomes a finished good somewhere down a debt debt somewhere down the line. Do you do you agree with that kind of that that way of thinking?

Yeah. It’s an it’s an input material.

I think if you use the word raw, some people connect it to Okay.

Raw technical data, which is heavily normalized sometimes in operational systems. So that that’s, I think it’s it’s not like that because that’s not a scalable model, and then you’re tightly coupled to that structure of that system. So meaning if that structure changes of that operational system, it would immediately have a disruptive effect on the structure of the data product. So it should be decoupled, but it’s it’s meant for input for other use cases.

And there, I learned, and I think this also links back to the business capability discussion we previously had. There’s always this dilemma of transforming, changing the semantics of data when moving it between business departments. So you can make materials as input for others, and these should be reusable at scale. So as soon as you start creating inputs for individual, highly specific use cases or consumers, I think that’s no longer a data product because then it’s no longer reusable. I think you’re indirectly building a point to point interface. So you should make it, yeah, read optimized to cater for a very large audience.

Unless Unless. Unless you’re changing the semantic to enable a cross functional use case in which in which case, it starts to look a lot like MDM.

Yeah.

But some some Mak also introduces this this concept. She talks about aggregates.

So on the consuming side, sometimes you see lots of consumers, they ask for maybe the same combination or integration or, enrichment on certain data. Would it be efficient to let all of these teams individually, yeah, do that work themselves? No. You could maybe create an intermediate product, a data product. So then you would combine basically input data products that would come from the providing source, align domains, and then you build an aggregate. That aggregate then becomes input for the consuming side of your architecture, so the consuming aligned or oriented domains.

So that’s where aggregate’s for. But and then, yeah, you have another interesting question or dilemma. What makes an aggregate then different from master data management? Because don’t we do with in master data management, the exact same thing. So we take data from different domains, different source systems.

Model that data, combine, integrate, harmonize, maybe, the data a bit, clean the quality, and then distribute that again to maybe other domains. So, yeah, there there’s a bit of overlap. Right. So and this is also what I thought about. So and I’ve written in my book. So I see master data management mostly as data quality principle.

So at the end, it’s it’s mainly for fixing data at the sources. So when you where you see differences between source systems or inconsistencies.

And you cannot do massive data management within individual source systems. So it requires you first to combine and compare data, and then you discover these inconsistencies.

I think as soon as you start changing semantics or enriching data, you add additional context to that data. I think that’s unlike, to me, master data management. So there’s a bit of overlap.

Maybe.

Maybe?

Well, if you’re changing if you’re changing the semantics. Right? If you like, for if if you’re changing the semantics of of of of customer okay. So why are you doing that? Well, chances are you’re doing that to enable some idea of of across functional universal semantic definition, semantic some sort of agreements at a higher level on that that resolve differences at a functional level.

Are you really changing it? Yeah. I think you are. I think often you are. You’re changing the very definition of what a customer is. You can be changing the definition And, yeah, that’s that can be problematic when you’re talking peer to peer.

Right? Like, that’s that is in in my world, that is classic what I call, like, swivel chair. Like, way back before we had MDM, when a contract moved between a CRM system and an ERP ERP system, you would literally have a sit a person sitting with two interfaces open on their computer changing things as it moved from one system to another to to make the semantics conform between those those two worlds.

And and MDM is, I I think, is arguably needed to enable the to you to do that at scale and in a way that is consistent and predictable and configurable where you can do that, maybe even change your governance policy. So, yeah, I think I think I think that’s you you MDM can’t allow you to start changing things.

I actually view MDM as just a really kind of fancy semantic layer. I mean, you touched on semantic layers in your book.

I do see a future though where where classic semantic layers, like, as to as defined today, like, supporting, like, matching and and connecting of schemas, right, for analytical uses or for for to execute a query, where I can write one query and it hits fifteen tables and the tables have different labels, but it’s all the same thing.

I see that in MDM eventually coming together one day. What do you what do you think about that?

Yes. Yeah. Could be very well. And I think now these days with new technologies or trends like generative AI, maybe we can fully automate that whole process of colliding and harmonizing.

Would that be awesome? Well, so it’s a good segue.

That that’s a good segue from from data products into more data fabric and OpenAI.

Do do you think that a lot of data governance could be automated?

Yes. At some point. But I think the core principles should be defined first, and there should be clarity, at least in my view, about the core principles.

Those you should define at the central level, and then, yeah, you can automate a lot of things on the levels below.

But, yeah, what what makes a domain? What is the definition of an a data product for you? I think, yeah, at the end, that’s, I think, something you should define. So what what principles are attached to a data product?

That’s not something I I would assume now open it does for you.

I I don’t know. It’s Maybe. Some of those copilots are pretty slick. I’ll I’ll tell you.

We’re we’re working to figure out ways to actually use AI to to help you establish the very foundations you were just talking about.

Mhmm.

I’ll give you an I’ll give you an example.

What is master data?

I mean, rhetorical. Right? I I don’t I don’t need you to answer the question. But, like, if could you use OpenAI?

Could you use a Copilot to help you understand what data should be considered master data or or not master data? Could you have a copilot actually go out there and see where the same objects are being used all over the place, and maybe you didn’t know that they were or they’re they’re being used in multiple places. So I think I think you could use AI to help you even understand what’s what some of the foundational aspects should be. What what do you think?

Yeah. Absolutely. But it’s still copilot, so you need to double check yourself where it does It’s a job. Right? But yeah. Yeah.

I think they’re generative AI. I think, would mean and bring a lot also to data data management.

So imagine, yeah, that data marketplace everybody dreams about. I think there will be Copilot in the future there, helping you to navigate.

You can ask questions about data products, or maybe you bring in requirements from your analytical use case and then ask for recommendations of what data products might be good candidates for building out that analytical use case.

Where do you see CDOs in in in in the Netherlands and maybe Europe broader? I I I I know you get across get across Europe.

Do you do you see more optimism related to these use cases, or do you see a certain degree of fear?

Because I I I see I see a lot of apprehension about widely embracing AI in the in the data management space, because of concerns about hallucinations, because concerns about compliance and and regulatory concerns. So those are very real. But but where do you think the temperature is these days, particularly in in Europe for widespread adoption of of AI in the data management space?

Yeah. It differs a bit between industries. So the governments, are way more reserved, especially within the Netherlands. I think we had some better examples, with an, yeah, also introduced a negative reputation, so and atmosphere around it.

But I think more sophisticated industries like banks and insurances companies, for instance. There, you see, they are way way more beyond. So and beyond already using a Copilot.

For instance, from Office three sixty five, we offer out of the box, but they actually already started implementing use cases of their own.

And, yeah, the results, I think, are also quite impressive. So you see with minimal investments, they are able to do massive cost reductions or, yeah, improve processes massively also by implementing generative AI in the right way. But it’s just a start we’re at. But still, yeah, these are more, I think, point solutions.

So individual use cases. Not at scale. Not yet. I think that requires way more investment in the in the foundation and, yeah, like the things we discussed about having data products already in place.

Yeah.

So let’s assume that I’m a CDO or a VP of data and analytics or, you know, head of data and analytics.

And I’m under a lot of pressure from my board of directors or maybe my CEO to have a AI story.

And and maybe I don’t know where to start. Maybe I’m a little concerned that my data foundations aren’t solid. Maybe I’m concerned we don’t have a very good focus on governance. Maybe I’m concerned we don’t have MDM.

We certainly don’t have data products, but I have the CEO saying, hey. We we need we need to show our investors that we’re doing something here. What would you say to that CDO? Where where would you recommend to start?

It it’s it’s two ways. So on one hand, you should build the strategy. So what is the longer term objective for your organization?

What benefits might bring AI to you as an organization. I think there will be plenty. So what lots of organizations do, I see, and and very well. They zoom out.

They look at all the the business processes or business capabilities or the the the business functions. And per business function or business capability, they list out, so what is the potential? Where can we apply AI, for instance, to automate or to make process more effective to or increase customer satisfaction? So they they build their story on one hand on that level.

And on the other hand, they start organically pragmatic by, yeah, setting up the foundation for later at scale usage of data analytics AI.

And those two, in my view, go hand in hand. So you cannot I think, yes, for sure, you can easily start without a plan, a master plan, without a strategy, and deliver immediate value.

But if you don’t have a plan in place or no strategy, I think longer term, that will be a disaster for your organization because then the program itself comes with, I think, lots of reboots and pedaling back exercises.

So you should have a longer term plan about, yeah, how to manage data at scale within your organization. What is the appropriate governance? What are the typical organizational building blocks you need to have in place? How would the platform look like, the infrastructure? So you need to have a few on that at least longer term, but at the same time, deliver value slowly and engage with your business stakeholders and convince them that data gives indeed the value most organizations are looking for.

I love it. In the US, we would say walk and chew gum. Do both.

Yeah. Perfect. Nice summary.

Focus on the short term, deliver some value. Now find use cases that are are are are are best supported by some form of out of the box gen AI, like an like an OpenAI for Microsoft or like one of the many Copilot that exist within the Microsoft environment. Right? That that could be on the engineering side. All of our engineers are using Copilot. I don’t know engineers that aren’t.

But whether that’s engineering, even data engineering, or maybe on the business side. Right? Find find a use case where where an OpenAI or any sort of other commercial, LLM could could help, and there there are lots of areas where I suspect you could be doing that. That’s the short term. But the long term is, yeah, you do need to get the data house in order because for the time being, the machines can’t tell us.

Yes. Can’t completely automate everything.

And for the use case, the recommendation, also use a Copilot. So I use large language models quite a lot to demonstrate, to organizations how to quickly find value out of all of those different use cases. So ask organizations then, give me your annual report or an overview of your business capabilities or an abstract of what these type of business functions do in that respect. And then we play with OpenAI around. I show them how easily it is to, yeah, craft out all kind of nice and appealing use cases and, what data could mean for that part of the business.

I I think even a copilot could So so, like help you with that.

You you you you would you I mean, OpenAI can take up to three hundred pages of text in a prompt. It’s it’s it’s insane.

Larger models.

Yes. Yeah. You can do a lot in three hundred pages. So what I what I just heard you say is is you could, like, feed it a an income statement or a balance sheet or even the entire securities, whether it be the ten k or ten q, you could you could feed that in and start asking questions about, you know, could we improve our our our cost of goods sold? Is that the kind of the stuff you’re talking about? Like Yes. Like asking for instance.

Yes. Or, I’m in this type of business. This is what the company, do or does. And and then, yeah, I use the Copilot to, yeah, start chatting about theoretical, maybe scenarios. Imagine this would happen to that organization. So what should I do as a CIO, for instance?

How could maybe analytics contribute to these various business domains? Give me examples of nice use case. I have this business process.

How would generative AI would fit into that? How could it help to increase customer satisfaction, for instance? So all all these kind of theoretical questions you could throw easily at such a large language model.

I was doing that yesterday, actually.

Just answering. And, yeah, it’s it’s nice for brainstorming, whiteboard sessions.

Yeah. Really great. And I think, yeah, you could do the same for data management.

I I was I was just doing this yesterday. As a matter of fact, I was I was playing around trying to get a feel for, a certain business metric I had been asked by a client.

You know, hey. We’re doing x y z, and and and here’s our performance.

Is that is that good?

I used to get asked this question at Gartner all the time. Right? What does good look like? Whether that’s data quality, whether that’s MDM. What is what is good look like? Are there any benchmarks?

And I and every time I would say, well, good is subject to how you define good. Right? You need to define what good means to you. Maybe sixty percent is good enough for you, or maybe it’s ninety percent. Right? Depends on your use case you need to which is the correct answer. But I was playing around with, ChativityT yesterday asking a few of these these questions.

Sorry. I wasn’t using OpenAI.

That works.

And the responses that gave back were pretty good. Of course, it started with the a disclaimer. It said, you know, these are all relative, and and there is no right answer. There’s no wrong answer, and you should check here. But here are things that you should consider, and here is what what, you know, for for companies in your industry you should be looking at. I I thought the answers were actually pretty good.

I need to go, you know, verify them. I wouldn’t just take it as as its word, but it was a great starting point.

Yeah. We always say within Microsoft, it is a copilot, not an autopilot. So you should always examine, review the results yourself.

So it’s an assistant Yeah.

Still. Yeah.

Well, that’s a great that’s a great dovetail into let’s talk briefly about the fabric because speaking of my time at Gartner, that’s something I that I that I did focus on when I was a Gartner in terms of my research, not exclusively, but it was something that I was very interested in.

And a couple of years ago, three years ago when I was still there, I was doubtful of of data fabrics. I’ve I I thought them to be too conceptual, too technically complex.

But what I’ve seen happen in the in in the last two years, first with, you know, Chai GPT coming on the scene, and then the work that Microsoft has been doing for about the last year, My perspective has completely changed. I have gone from a a a doubter of data fabric to now I see it. And it it was really it was it was really generative AI that that opened my eyes because the very things that we’re talking about. Right?

Like, what is master data? What what what what could my my data models look like? What would be an effective integration pattern? What would even be an effective architecture?

You can start asking a lot of these questions.

And and and there’s infrastructure now that can actually support this kind of hyper integrated world that wasn’t there before, that wasn’t there two years ago. So what what do you think is some of the more more kind of exciting and interesting things about about the data fabric, and where do you see that going?

It’s it’s to me that intuitive data marketplace that can be used for all of these areas that we already discussed for data management. So for data governance, for the data steward, for master data management, for data engineers looking into data, data quality, automatically building pipelines, or generating scripts.

And those script do all the integration harmonization for the automated.

Yeah. I think it’s, yeah, I think we’re at start, now. So it’s hard to predict, but, looking, I think, at the tiny things we already, I think, have experienced with Copilot.

Right.

It goes beyond our imagination, what what will happen within the next two or three years from now.

Mhmm. Do you see a world I I this was a topic of conversation between myself and, Joe Rees, Chris Tabb, a a few other data people in in in Austin a couple of weeks ago. I asked the question of and the responses here were were interesting.

I asked the question of, do you see a future where analytical stores where there is no more difference anymore between analytical stores and operational stores?

Could there just be one database, right, serving all use cases, transactional data, business applications, and analytical workloads? Could those things become one in the future? What do you think?

We still have these laws of physics, which are why? So, yeah, I think it it’s it’s it’s depends on how you would define and classify operational usage versus analytical usage.

But, yeah, from an operational viewpoint, you know, we have these integrity concerns. We want to ensure these systems run stable, and we do not, put too much burden on them with our analytical workload. Loads. So we always have, I think, clear split between OLTP and OLAP, and I think that won’t change in my view the next couple of years.

So So you’re getting into kind of throughput constraints from a read write perspective.

Yes?

Yeah. Well, you can combine these two concerns into maybe one system, and I think Gartner also has a name for that.

It’s called HTAP, so hybrid transactional analytical process.

Know that. Okay. Interesting. Alright.

Where you have maybe one database. So so if you look at that system from the outside, it it looks like one giant piece. But when you start to peek inside, well, there are tables allocated for more the operational, workloads and processing, and there are analytical tables which are most likely replicas from data from these OTP, tables then. And those can be used for analytical purposes.

So maybe some hybrid perception.

Yes.

That’s, because I think yeah.

Could you have have a table where you, on one hand, you’d use that table for operational processing? And at the same time, will you read from it again and again and again for analytical purpose, maybe who who knows?

What could you say?

Happens. That’s the shift left.

That’s the shift.

That’s That is the ultimate. Yeah.

But, yeah, I think that then we also fundamentally need to change the design of our operational systems.

Mhmm.

So way more thinking, Append’s only blockchain where, yeah, we make a commit and then, those those commits are allocated, will be read only, and, we start to replicate and duplicate all that data for both the operational purposes and the analytical purposes.

Well, the the first part of the transaction log is mainly kept for the operational purposes. Yeah.

Oh, I love it. Multiple phase commit, blockchain. Okay. So this this could be another topic, like, another hour.

I think then but then, yeah, we slowly, I think, go back all the way in how we start the conversation with the data products. And I think that was also the original view, at least how I interpreted Zemmak’s version of data mesh where, yeah, we have a really nice modeled operational system with microservices and, clearly, I think, separated, segregated out also the operational workloads from the analytical workloads. And I think that yeah. Then we’re back again in how the discussion started.

Well, we like to do this in the world of data. It’s all a pendulum. Right? We swing from centralized to decentralized and yeah. We we like we like to do these things. But I I could have an interesting conversation around blockchain because I think I think the the first time I ever read Data Mesh by Zomak and I’ve read it three times because I have to because I’m just not that smart.

But the first time I read it, the thing in my mind, I was like, blockchain.

Right? Because the the the the roadblock I hit every time when when when I when I conceptualize the the data mesh is the thing that we talked about when you said aggregates. Right? That’s that’s I hit this wall, which is cross domain governance.

Right? And and and governance, I I would argue, can exist in multiple levels. And you actually you suggest this in your book as well that there’s there’s there is kind of functional level, domain level governance, and then there’s enterprise wide governance. I would argue that there’s cross functional governance that it’s it’s it’s a Venn diagram.

Right? Three rings in a Venn diagram. Right? Governance rules can exist in each of those places.

And when I think about a mesh, I was like, okay. In the middle there, I you could argue that’s a domain, but there has to be some sort of change to the to the data to make it conform to that context.

So when I keep when I think about this stuff, like, at night, and I’m weird because I do. Blockchain, it just keeps coming back to me. Blockchain. Blockchain. Because peer to peer, peer to peer governance, peer to peer governance at scale with automated governance where the rules are managed via contracts because they could be. Smart contracts in a in a blockchain. Like, the idea of a data contract could exist as a smart smart contract or a blockchain.

That to me seems like the only technology that exists today that may may enable that ultimate peer to peer.

And then I think in in in that view, you also replicate in the metadata. So you commit that along with the data into the blockchain. The blockchain starts distributing all that data between different cross functional domains.

Won’t be that fast, so you need to replicate that data maybe to additional reach stores for having sufficient performance maybe on the consuming side. So if you would like to do dimensioning modeling, I would assume you replicate data from that blockchain into maybe a column store for better performance. Right? Mhmm. But at the end, it’s the blockchain that includes the metadata that will be distributed, synchronized between all of these parties.

Yep. Yeah.

Well, there are there are these things called oracles that’ll allow you to jump between kind of traditional rows and columns Mhmm.

You know, some sort of relational store and things that are stored on kind of on immutable ledgers only. And so you can traverse those worlds. There are throughput consistent. See, I told you we could talk about this for an hour.

So Why do we need blockchain? I think blockchain then is it’s to me just technology. You can also use event streaming or a platform like Kafka. You distribute events in it, and there will be events.

I have an answer for sure. Subscribers. I have an I have an answer for it. We need blockchain because it also serves as an incentives management layer.

In a truly decentralized peer to peer governance model, you could create particularly if you’re going company to company, you could create what what you could call a free rider problem. Right?

So meaning meaning trust, you mean between domains.

Right. Meaning meaning, I I stewarded let’s just say I’m managing a a customer record. That’s it. Or a record about about Microsoft.

Right? It’s it says, you know, here’s the here here’s here’s Microsoft as as as an entity. Like, just take for example the the business model of Dun and Bradstreet. Right?

All it is is a is is a registry’s information about companies.

You could manage that on a blockchain where I stewarded me.

Like, I make I I I review it. I make sure that’s okay. Okay. How do I get paid for that? Because if I’m gonna go give that into a blockchain, if I’m gonna commit that to a blockchain where other people can read this and benefit from my work, how do I get compensated for that?

Well, blockchains are pretty good at that. Actually, you could create you could create an an incentive layer that was buried into the blockchain and said, okay. Every time that you write a a a new entity on the chain, right, every time you update the chain, every time you do something, per the governance standards that we have managed, you know, per our rules, If if you were committing something to the chain, if you’re writing a new block, then you get compensated for that. So that’s I think that’s the answer. I think you could.

Yeah. It’s it’s easier to enforce, I think. Right.

Yes.

Yes. Yeah. That kind of trust on that level. Yeah. Because of the blocks. Yeah. Makes sense.

Maybe we need to collaborate on a new book. This this I I think it’s blockchain for data management because I don’t hear a lot of people talking about this. I think it could be very, very interesting. But I get I get ridiculed, killed, locked by some of my peers internally every time I start talking about junk blockchain. I think I’m crazy.

So, Peter, thank thank you so much. This has been a wonderful conversation.

Talking to you.

Yeah. It’s it’s great. I I look forward to to perhaps connecting with you, in your appointment these days. I was I I was in Amsterdam two weeks ago, in in the airport, like, right next to your offices. I I I should have known. I would have come, stick my head in and and and say hello, but, thank you so much. This has been wonderful.

Thank you to all of our listeners, for tuning in to City of Matters podcast. Please take a moment to subscribe. That would be fantastic. Look for more episodes coming in the very near future.

We talk about data management. We talk about data governance, data strategy, you name it. If it relates to CDOs and anybody who wants to be the CDO, this is the place to get information. So, Pithai, thank you again so much.

Data Management and Scale by from O’Reilly, please go check out the book if you haven’t already read it. And we will see everybody on another episode of CDM Matters sometime very soon. Thanks, everybody.

Thank you.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.

Malcolm Hawker
Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic
The Profisee website uses cookies to help ensure you have the best experience possible.  Learn more