Data Fabric Demystified

Episode Overview:

ChatGPT. Composable data and analytics. Connected governance. The data fabric.

In this 22^nd episode of CDO Matters, Malcolm focuses exclusively on providing an extremely “deep dive” into the data management architecture known as the data fabric. The data fabric has gone from relative obscurity to nearly the top of Gartner’s hype cycle in just a few short years. For this reason, forward-leaning CDOs should understand exactly what the data fabric is and how it can benefit their organization.

Starting with a basic definition of a data fabric, Malcolm proceeds to break this rather complex phenomenon down into its component parts — using language that is precise but highly digestible for any non-technical CDO or data leader. Translating the complexity of a fabric into understandable critical capabilities, he shows how a data fabric can eventually be leveraged as a transformational tool for any organization.

For those wondering exactly what capabilities are needed to enable a data fabric, Malcolm reviews the top three capabilities that he believes must be present for any solution to be considered a data fabric. Armed with this information, it becomes clear that the data fabric remains — at least in the short term — an aspiration for most companies given the sophistication and governance maturity needed to enable them.

Malcolm reviews a conceptual architecture of the data fabric and discusses how fabric capabilities will be deeply integrated into — and across — several legacy data management systems, including master data management (MDM), data quality, data governance and data integration platforms. He challenges the notion that any one solution will be used to enable data fabrics — and instead outlines how several software and analytical solutions will need to be deeply integrated to enable fabric capabilities.

Finally, Malcolm ends his demystification of data fabrics by sharing a few key considerations to help CDOs cut through all of the hype related to data fabrics. This includes practical actions data leaders can take now to better position themselves for leveraging data fabrics soon.

Update from Malcolm: This episode was recorded before the production launch of ChatGPT. In the span of just a few weeks, I believe the data fabric has gone from a conceptual framework to something that could easily be envisioned within a modern data estate.

Put another way — if the entire internet before 2012 could be used to train an AI-enabled language model, all your enterprise data could most certainly be used to train AI models that are optimized to support data management use cases. I now believe the data fabric is how AI will be operationalized, at scale, to optimize — and eventually automate — the creation, consumption and management of data within your organization.

I’ve struggled for the last two years to visualize exactly how data fabrics could be implemented at scale, but thanks to ChatGPT, I no longer have this struggle. Hopefully after watching — or listening — to this episode, you come to a similar conclusion.

Key Moments

[7:51] The Hype Behind the Data Fabric
[10:46] Diving Deep into the Fabric
[13:01] Data Fabrics Defined
[21:31] Key Fabric Capabilities
[22:15] Active Metadata
[27:01] Incorporating AI Technologies
[29:56] Synthesizing Metadata with an Intelligence Layer
[35:31] The Data Fabric Architecture
[40:06] Key Fabric Considerations
[52:31] Closing Thoughts

Key Takeaways

The Future of the Data Fabric and AI Dependency (16:38)

“We are making a pivot away from people defining the [data governance] rules, the integration patterns, the data quality standards. We are moving away from people deciding that to robots deciding that. That’s a spectrum. We’re largely people-driven today…we’re early in the days of the spectrum from entirely people-driven to entirely robot-driven. We’re early in those days, but where the data fabric goes and the ultimate path here is towards a world highly dependent on the [machines].” — Malcolm Hawker

What Can Metadata Do for You? (21:40)

“What active metadata really means is that, if you had a lot of metadata, and you has some pretty sophisticated analytical tools, and you had some pretty sophisticated new technologies, you could make that data tell you a lot of things about the state of your data enterprise. For example, in theory, you could know when data was accurate or inaccurate.” — Malcolm Hawker

The Current State of Data Fabrics (40:12)

“Data fabrics don’t exist yet. You can’t go buy one. There is a ton of promise here. But between where we are, and where we need to go and between concept and theory, there are some really major roadblock issues we need to overcome. And frankly, there’s a lot of technology that doesn’t even exist yet.” — Malcolm Hawker

Episode Links & Resources:

Episode Transcript

Malcolm Hawker

Hi, I’m Malcolm Hawker and this is the CDO Matters podcast. The show where I. Dig deep into the strategic insights, best practices and practical recommendations that modern data leaders need to help their organizations become truly data-driven. TuneIn for thought provoking discussions with data. It and business leaders to learn about the CDM. Matters that are top of. Mind for today’s chief data officers. Hello everybody. It’s Malcolm. Thanks so much for catching this episode of data fabrics demystified. I’m excited. That you’re here. Thank you for listening to the CD Matters Podcast and for checking out this episode and all of. Our other episodes. This is a bit of a short. Video intro. This is not the main need to what you’re you’re gonna see today. You’ll you’ll hear me talking a. Lot about the. Data fabric but I. But I felt compelled to record a short kind of disclaimer at the beginning of this. I recorded this session about the data fabric in early March 2023. It’s now late. April 2023 and what a world of the difference two months makes. I I can’t believe how quickly things are moving in the space that they I yes, largely thanks to ChatGPT, but certainly others barred. Ellie for images and and and you name it. It’s mind bending. And if you’re paying any attention to what’s happening online or in social media or in the media writ large, AI is taking the world by storm. What you are going to hear me share in this discussion about the data fabric is something I call an intelligence engine or an intelligence layer. It’s a critical component of a data fabric. It is this brain that I tell you doesn’t exist yet. And at the time, I was largely correct. But I’m getting talk about this. Intelligence layer that is a critical capability, that enabler of the data fabric. And what I will say is that it’s several years away from actually being developed and productized and made ready for prime time. Well, in the span of two months, I think. I’ve been proven wrong. I think that this intelligence engine that I’m going to talk about in the in kind of the main section of the podcast today, this intelligent engine is AI and it’s. Out there and. It’s these learning language models in the form of ChatGPT or BI others that are completely quickly. Rapidly in the pace, I just can’t even get my head around. Really kind of revolutionizing. The way we interact with AI and how we think about AI. So I’ve gone and. Literally in the space of two months from being. Kind of a data fabric skeptic. To a massive. Data fabric believer and let me tell you why. I talk about this need for an intelligence engine and I talk about the amount of data that it requires and the compute power that it requires and how it’s going to be so hard for companies to develop this stuff on their own. But they don’t have to. Develop it on. Their own it’s been developed. I honestly and truly now firmly believed that the data fabric is exactly how. AI will be operationalized at scale to. Support data management, use cases and operational use cases of data, meaning not. Only will we be. Using AI to answer questions like what’s the best data quality rule or what are the best data governance policies to implement or what are the best integration patterns? Or should this data be centralized or decentralized not? Will AI be answering those questions? But it will also be answering questions of what customer record should I be looking at right now. What’s the best way to process this invoice? What credit rating should I be applying to this potential supplier, consumer and on and on and on? So I now firmly firmly. And truly and. Honestly believe that the the concept of a. Data fabric will be how we. Operationalize AI within the world of data and analytics. So guys, it’s unbelievable in the space. Literally, of two months. I know it’s a little. Longer chat. GBD 3 was kind of limited release in November and it’s kind of been. Dribbling out but but in. The space of. Two months, I think. Released for me and my journey here, we’ve gone from. This is a fun conceptual thought experiment to oh wow. I get it. All you need. To think about. Here, yes, AI is is still. A long way from prime time. Yes, we’re talking about V1. You. Know versions of these products, yes. Chad GBD 3 and four are both hallucinate, no problems, right? But what you. Need to be thinking about here and this is this is really all. You need to think about, I think. In my opinion. If the entire Internet can be used as a training data set for things like ChatGPT 3, and four and future versions of it, obviously. And the computing. Power exists to date to crypto. All this data. The systems exist today. A lot of the knowledge largely exists. The AI exists. These models exist. If we can use. The entire Internet as a training to. Train a dating training data set. It’s late in the day. Yeah, I still have this much passion late in the day. We then can certainly use our internal metadata. We can use all of our log data. We can use all. Of our internal data. As a training set. To help AI inform and maybe even automate decisions about how we manage data, it’s that simple. If we can. Do this on the Internet and we can answer questions about hey, write me a poem or what’s the best way to do a or what’s the best way to do B? And we can use the Internet as a training. Data set we. Can certainly use our own internal data for this as well. So there’s a little bit of work here, obviously to kind of commercialize this and find ways to apply these models that are now kind of Internet based and into more proprietary solutions, but trust me. Every vendor on this, every vendor in the data manager space is working on it. I spent most of my day in a meeting today trying to figure out how to operationalize these insights because this is planned formational guys. This is game changing. And I now again, I firmly believe the data fabric is how AI will be operationalized at scale within most data and analytics organizations. And what you hear me say in this podcast is I’m a little. Skeptical, and we’re probably five years. Away. It’s out the. Window, I think. I think we’re probably two to three years away. From prime time on this, and that’s about as long as it’s going to take most of the vendors to develop some solutions in this space. But honestly, what I’ve seen playing around with auto. GPT and a few other things like this. What used the code? That used to take three years to write. Is now going. To take three weeks. To write thanks to AI so man. Stuff’s moving so fast. I’m totally talking about the data fabric. There’s still a ton of value to glean from the podcast you’re about to listen to, but like. I said just. Keep in mind things are moving very, very. Quickly here and I think the world has gone completely. Upside down in the last. Couple of months and you’re going to hear exactly why in this podcast. But keep in mind I think that this is revolutionary. Things are moving really, really quickly here and you need to know what to do in order to prepare for data fabric enabled future. Thanks everybody. Good morning, afternoon or evening. Or whatever time it is, wherever you are.

I’m Malcolm Hawker. I’m your host of the CDO Matters Podcast. I’m thrilled that you’re joining me here today. I think this is our 22nd episode. If, if I’m not mistaken, if we haven’t slid something else in there around 22, I I believe I’m recording this in early March. I suspect by the time you see this, it’ll be April. But if so. Happy belated Saint Patrick’s Day. I recently became an Irish citizen. Believe it or. Not, which is. Kind of fun. My mother is Irish. My grandfather is Irish and. I decided to just. Kind of make it legit recently. So it’s not Saint Patty’s day yet. On day of recording, but I’m looking forward to that on the 17th. And hopefully you were able to have some Irish fun. As well, today we are going to talk about the data fab. Now the data mesh, but the data fabric and the whole topic of today is kind of demystifying the data fabric. So as I record this in early March in another couple of weeks, I will be headed like many other data and analytics leaders, I’ll be headed to the Gartner Data and Analytics Summit. It’s an annual event. Although the last time they held it thanks to COVID was actually last August, so it was only seven months ago. But it’s. An annual event. That is very common for a lot of senior data analytics leaders to attend. If you don’t. Go or haven’t gone. I would most certainly recommend it as a matter of fact, if there was really only one event that I would. Attend a year. I would recommend Gartner. It’s pretty good. They doing really, really good job. Of putting on kind of large scale. Events, they’ve got a large array. Of speakers, yes, they’re primarily Gartner and. This but for VPS of data and analytics for CDO’s most certainly. Other kind of data leaders, even CIO’s. There’s a lot of information there. There’s a lot of things that you can learn. It’s a broad spectrum of topics, right? It’s the data and analytics summit. So they’re going to go into things around data quality MDM. Data integration, data virtualization, data, fabrics. We’ll talk about that some more. But you’re going to get. A wide spectrum of insights across a number of different kind of data management disciplines. Data and analytics just analytics. A yes AI, ML data science those things. Are all covered, so it’s. Really suitable for for somebody who has a pretty big span of of influence and at a pretty large portfolio, AKA CDO or CIO or VP of Data and Analytics. I simply recommend that you go, but why am I talking about the data fabric today and why am I going? To demystify it. And chances are, if you’re going to that event, you’re going to get a lot. Of information about the data fabric, it is right at. The top of. The hype not at the top. Getting close to the top of the the Gartner hype. Cycle it’s gaining a. Lot of speed out in the market and I know by the time this podcast airs in April of 2023, maybe even may, probably April of 2020. Three, the the fever. Around the data fabric will be high because all of the VPS and CDOs will be leaving Gartner and will be all. Data fabric dump and excited. Because this is. Kind of how it works. I wasn’t going to the analyst for for three years and I I kind of. Know the. Cycle here where it’s, you know, people go to these events, they get all worked up about, you know, these cool things and data fabric will be hyped at the event Gartner. Is is good at a lot of things. They are really good at hyping their own things and data fabric. It’s one of their own things and they hype it and it will be hyped so. Kind of been through this. Wash, rinse and repeat cycle and having having kind of participated in this. I I know that people will come out probably when the data fabric is considered they’ll they’ll come out saying man sounds pretty cool sounds revolutionary. Sounds like a pretty valid shiny object here, but honestly I’m not entirely sure I get. It all I. Got a lot of questions. Some of the information that was expressed was really conceptual and really theoretical and really high level, and I’m having a hard time taking it from 80,000 feet down to 20,000. Feet. We need to go to 5000 feet. We don’t need to do that. Your your team will help with that. But going from 80 down to maybe 20 or 30 is what this podcast is all about. Because I know from first hand experience having been involved in creating the data fabric narrative, I was kind of there. At Ground Zero, I wasn’t the person leading it. That would be one of my friends and learned colleagues Mark Byer, because Jane and a few others greater Desimone, couple others at Gartner, really kind of driving that narrative. But I was involved in the peer review of a lot of the research. Was involved in a lot of the peer review. Of the event went into the. Kind of the early. The the zygote stage. As it were, of the data fabric, so I I think I’m fairly uniquely positioned to provide some valuable insight here and to help you make the jump from 80,000 feet down. To 20,000 feet because. As good as Gardner, is it a few things? That’s one area where they’re not so good, particularly in this topic area. They’re not so good at really kind of. Making things real and and making things more kind of pedestrian. And I mean that in a good way, not not not in, in not in a bad way meeting things digestible. Or maybe even better. Yet operational, operationalize implementable, making the data fabric kind of implementable. And then there’s a lot of there’s a lot of cautions here, guys. So that’s what we’re going to talk about. I didn’t bring on a. Guest because I think. I think that. This is an area where I know I’m qualified, and I know that I. Can add a ton of value. And I thought it probably best if I just kind of do my own firing here. I’m going to share. Some slides before I dive into that. I I also. Realized that a lot of you will. Be listening to this in audio only, I mean. It is a podcast after. So my apologies if if I’m talking to slides and you can’t see them and you’re maybe listening in your car as you’re driving to. Work or something? Else I’ll do my best at making sure that those slides are available. Through LinkedIn or? Or through our website or or others other venues. As well and. I’ll certainly be talking to them as. Well, so if. You are. Thing on video, on YouTube or any of the other kind of video channel. So we distribute, we distribute this content through and and you you. See me kind. Of reading the slides you like, do it. I can read it, it’s because I’m I’m I’m trying to get this out across all channels and particularly our our audio channels. I’m going to bring up on the screen now a definition. Of data fabric. Now this is my definition me. And and it’s a little different. I think that if you were to read Gartner’s definition, which is really nebulous and kind of hard to pin down, like it’s a balloon half filled with water, it it, it’s kind of squishy. And that’s by design, by the way, from a. Partner perspective. But. My definition is, is, is. A little more, I think digestible. And and it’s this data fabric is a conceptual and yes, I’m saying that today, maybe it will jump from concept to actual, you know, not just concept but. It’s a conceptual. Data management architecture in which data itself? Informs the classification, management and use of data within an organization. So let’s let’s break this down a little bit. There’s really. Kind of three key chunks. Here to to, to digest and we’ll go into more definite. We’ll go into more, more. Detail but. It’s a data management architecture, right? It it. It is a way of managing data well. What does that mean? It is a way of governing data. It is a way of modeling data. It is a net way of controlling and managing data. It’s a way of integrating data. It’s a way of ensuring that data quality standards are applied to your enterprise wide data and on and on. It’s a management. Right. So is the data mesh by the way, but a data fabric is different than a data mesh. I would call it conceptual because that’s what it is today. You will probably come away from. The Gartner event last two year, when it was August, Gartner was talking about a 70%. Penetration of date of fabrics. I suspect that will probably go to 10. Frankly guys, I’m still not buying it. I think Gartner is way ahead of their skis on the number of companies that have truly actually implemented the data fabric. I think we’re probably talking about like a single hand like that have probably done it and the ones that have done it have done it in a very, very manual non automated way. And arguably what’s been implemented. Is not really the full expression of a data fabric. We’ll talk about that some more. But suffice to say, this is. A data management architecture like the data mesh. It’s so it’s a way of managing data across an entire enterprise. Another key thing here. And this is. Really where we get to the nuts. And bolts of the matter. It’s an architecture where data itself informs its classification, management and use. What that means this is this is this. Is kind of foundation here cause historically we. Us humans, people informed the classification, management and use of data. We built the rules. We said that this is a data model. This is a data definition and on and on. And on we, we. Did that now in the future. I’m saying data is going. To do it really, it’s the AI, the ML. I yes, I I I’m. I’m kind of stopping myself as I even say that. But but that’s what it will be. There will be some sort of highly sophisticated AI and ML fueled logic. That will start to make decisions and about how data is classified, how it’s managed and how. It is even used where it is. The data itself, primarily metadata. Talk more about that, but where data is informing. How it should be used? Right. That is really, really foundational. So we’re making a pivot away from people defining the rules, the integration patterns, the data quality standards. We’re moving away from people deciding that to the robots deciding. That that’s the spectrum. Right where we are today, largely people. Driven, yes, there are some AIML that are helping with things like data stewardship rules and and and building data models and automatic data profiling and discovery. So we’re we’re early in the days on on this spectrum from entirely people driven to entirely robot driven early in those days. That where data fabric goes, the ultimate path here is towards a highly, highly, highly dependent a world highly dependent on the robots and by robots. That’s my good way of saying AIML graph other advanced technologies that can help automate. Large scale analysis and management of data because it will need to be the machines doing this because humans won’t be able to do it at the. Scales that we’re talking about, they just won’t. So that’s key data starts to inform its own classification management and use and we move away from people and move towards more automated with doing that. So that’s the kind of key thing #2 and #3 kind of it’s it’s enterprise wide, it’s. Across an organization. I suppose you could say in terms of enterprise wide versus non enterprise wide, I think you could probably make a case that a data fabric could be implemented on a department level or an application. Well, I just don’t think you’re going to get a ton of value out of that. Honestly, guys, I mean, I mean, this is really about scale and automation and there’s a bit of a paradox within data fabrics and that the more data you throw at this thing, the more data you feed it, the more data that can be used to train analysis and train. The graphs and train what will what we will call a knowledge layer or a semantic layer or an intelligence layer. The more data you’re throwing at this thing, the more intelligent it’s going to be and the better, in theory. Knock knock. In theory, the better decisions it’s going to make. So this is I really see this being an enterprise wide paradigm especially for very, very big companies. I think again, you’re just going to get if you’re if you’re relatively small company or if you’re looking at this within an individual silo or application or department or division, it’s probably not going to work. Just just because fabrics are all about scale. So that’s the definition I’m going to have some coffee. It’s still coffee time here. And if I didn’t, I my throat would certainly most certainly dry out, so Please be patient with me as I chug on my ground up beans. All right, that’s the definition. Let’s talk about some key capabilities of the data fabrics and a lot of these I’ve already kind of touched on. However, I think there’s really. Going to three. Three things that are that are like required. Gartner would call these critical capabilities. I call them key capabilities. It doesn’t matter. These are the three things. You kind of got to have. And if you don’t have them, you’re probably not a data fabric. Number one, what Gartner calls active metadata now. This is different from metadata and because it’s it’s active, what does that mean? Well, along in this spectrum that I was talking about. For where you. Go from kind of static management. You go from human to fine business rules. There’s also a. Spectrum here in in what Gartner has kind of largely determined and called data activation. Today, metadata is largely not active. It is. It is static it. Is just captured, it’s it’s a snapshot in time, right where we go when we gather all of this data about data, all the fields, attributes, you name it, transactional data. This, this, this is all data here, folks. There’s there’s really no not a lot of dividing line between. That couldn’t be and should and shouldn’t be considered metadata. I mean it’s it’s everything, it’s web blogs, it’s transaction data, it’s it’s it’s everything, right? And today, all that data is kind of largely. Being captured dumped into. Data warehouses and other kind of data stores. And it is the this snapshot in time and it represents something as of that, that, that point in time. And it’s not being really kind of used to drive a lot of business decisions. It may be used for analytics, it may be used for insight, maybe using AI and ML and and and for some data science. But for the most part that that data is not. Interactive, maybe for lack of a better word, what? What active metadata really really really. Means is that. If you had a lot of metadata and you had some pretty sophisticated analytical tools and you had some pretty sophisticated new technologies, you could make that data. Tell you a lot of. Things about the state of your data enterprise. For example. In theory you could know. When data was accurate or inaccurate. So think about that right. Could you scan data and look at data and look at transactions logs and look at also the weather, metadata metadata to determine when data is active or inactive. Or not inactive or inactive. Sorry when it’s correct or incorrect, high quality or low quality. Right, so let’s use. Let’s use data. Quality as as as a metaphor here, because I I think it’s. I think it’s appropriate. In theory, you. Could look at logs in, let’s just say an ERP system or a CRM system and you would be able to say, OK, well, this is the data associated with this transaction, this transaction if you followed that transaction all the way through you followed it all the way through quote to cash all the way. To Rev rec all the way to the. Delivery of goods potentially. Maybe this was a transaction for somebody buying something. Where you deliver goods and even maybe even after delivery of the good. There was some customer service. This feedback and on and on, and. On in theory. You could determine whether the address on that customer record was accurate or you could determine whether or not other attributes, let’s say of the customer record, because we love customer records here were were were fit for purpose or not fit for purpose. Did the transaction fail? Were there problems along the way? Or there was there a pause somewhere in the life cycle of kind of that that that, that that customer transaction all the way from a quote all the way to Rev rec and delivery and on and on were there pauses? There were there. Other indicators that could tell you that there. Was something wrong with the data? The data fabric would say yes, most certainly there are going to be plenty of indicators that. Are going to. Tell you when data was fit for purpose. Or whether it wasn’t fit for purpose. There are glaringly obvious examples of this where a transaction would fail or be kicked out of some sort of process, or maybe where there was even a human review of that that that process just kind of. Failed or stalled or stopped, but in theory with the data fabric where we’re going as a world where data can tell you when. Things are accurate or inaccurate. Data could even inform things like the optimal integration patterns. Data could tell you when something is should be considered master data or not. Master data that what I would argue is actually relatively not knock easy. I’m oversimplifying, but it’s relatively easy because you could look across your metadata and you could determine where data is being widely shared across the organization. Another great example, right kind of this pivot towards more active metadata which is you could look in the data. You could look at transactions. You could look at reports, analytics and you could pretty quickly understand understand where data is being shared or not. And that’s really kind of the key determining factor of when something should at least be considered as master data or not master data. So the list of examples here guys is long, but the whole idea is that metadata will be activated. How will it be? That’s number two on this list. The use of AIML graph other advanced technologies to enable the. Automation required to analyze massive troves of metadata. OK, so to activate the metadata which is kind of key critical capability #1 active metadata critical capability #2 these use of these modern and new technologies to analyze large troves of data to start to recognize patterns, right? This is something you’re hearing in this idea of data observability. I’m not a. Huge fan of data observability cause I don’t think it’s new. I think we’ve always. Done this, but the whole. The ideas here are consistent, right? Which is you? You could observe the data. And you could see what’s working, what’s not working, what’s accurate, when what’s inaccurate, when are things inefficient? Or less efficient than they could be and and on and on and on. So these technologies will be applied to understand, to activate metadata, to drive the insights that we were previously talking about.

Speaker

To to help.

Malcolm Hawker

Inform you of maybe a different way of even modeling data, or a different way of defining data, right? Maybe there are. Attributes that are relevant to a given object that you’re not, that you as the human in that human driven process, really aren’t adequately considering. For example, a customer record today, maybe the human says, well, this is how we define a customer and these are all the attributes. That define a customer. Maybe when you start running some graphs you learn some of the unknown knowns. Here you knew the known customer that’s important, but there may be some unknown. Relationships that exist between your customer data that you weren’t even considering before. So that’s how you would potentially activate metadata through the use of these new technologies to inform you to say, aha, maybe there’s a relationship here within my customer data that I need to start managing as master data or that I need to apply some. Unique governance rules to. Because it’s far more relevant than I. Ever thought it was before? Maybe there’s. A correlation between. Clean data field one and data field 2 some meaningful relationship that actually is meaningful from the perspective of driving more sales. Even so, we’ll talk about that some more, but the data fabric is inherently operational. It’s not just about great analytics, it’s about. Fueling the enterprise. So that’s key capability #2 of the data fabric, a heavy use of these technologies to process a lot of data at scale and start automating some of the kind of the data, classic data management decisions that we’ve always done. But we just did them as humans, as generally as part of governance programs as part of the implementation of the. Integration layer part. Of virtualizing of data or building data models are on and on and on key capability #3. And I keep hinting at this. But it is the idea of some form. Of what I will call intelligence layer. That synthesizes all. Of these insights and all of this act of metadata together in some operationalize. In some implementable way, in some meaningful way. In theory, this intelligence layer would also be the place where governance policies are maintained, managed, enforced and defined, right. Because we’re going to get to a world here, folks in a in a really truly dated. Fabric driven way. World, we’re getting to a place where. The humans are going to define some governance rules and the machines are going to start to define some governance. Rules and I know that may sound slightly who radical. There are those. Listening now, that would say, what do you mean? Governance rules are only those things that VP’s are signing off on. I disagree. Governance rules even go into things like data definitions. How you define a customer is most certainly, or a product or employee or location, or any field or any attribute. How you define it is most certainly a governance decision, even if it is a relatively low level, fairly innocuous 1. Still, the governor’s decision. So we’re going to do a world where. We will be in a world where humans are making these decisions, but they are increasingly augmented by the machines and probably ultimately automated by the machines, at least for some aspects of the data fabric. But we’re talking about here about some sort of intelligence. Layer it combines data governance. It combines MDM, data quality, data integration, even BI and analytics to a certain degree. It combines all of these things, this classic data management capabilities that today exist in all of your data management applications. All the ones that I. Just mentioned right, all of those rules. That are existing in a data quality tool that exist in an MDM tool that exists in even a BI and analytics tool. Like the the the group buys the. Sort buys all the things that you do to get analytics in front of people. All those rules will be. In theory in the future. Subject to the. Initially I would say the scrutiny of. Active metadata to the scrutiny of this intelligence layer and in the future we’ll really kind of start to be automated. That’s that’s. That’s the kind of the dream state of the data fabric in the future. But going back to the example that I gave before, things like data quality rules, when is data fit for purpose and when is it not fit for purpose? What is data? Successful transactions, shall we say, that we know, because we looked into the data, the metadata to to get that insight. Well, in the future that will. There will be this layer. And I’ll go ahead and I’ll call it. A data operating system. I suspect that phrase has been used before and it will be continued to be used, but really I couldn’t figure out a better way to kind of say it, but I think there will be some sort of operating system in the future assuming data fabrics actually reach their fruition. Gartner seems to think they will. I’m not entirely convinced, but in the future there will be some sort of data operating system that where humans will define the rules at the beginning, and those rules today all largely exist within those data management applications of those talking about data quality, MDM, governance. Not data science, you name it all those. Rules are kind of out there. All over. The place they’ll be consolidated into some. Data operating system. That I think today. Gartner would loosely call. This thing known as the data and analytics governance platform, they, they, they, they my name is on it last year and 2022 was it 2020 or 2021. All the years are melting together and that’s that. I think it’s 2021 was the first year they published a market guide for something called the Data and Analytics. Governance platform, which is this and again my name is on it, which is kind of slightly ill defined. Catch all for this. Merging of data management capabilities into a single platform. I think that’s kind of. Where Gardner see. Things going is this. This merge of all this stuff together. I’m not convinced again. Just just we know history of of what happens when you try to to make a catch all be all single monolithic giant. Sibo like solution that does everything for all people and all use cases and all divisions and all departments. Those things we don’t have a good history of of building solutions like this, but. Suffice to say. There there is some idea. There needs to be some idea of an intelligence layer in a data fabric. I would argue for that to really be a data fabric. So those are the kind of the. 3K capabilities, active metadata, heavy use of AIML graph, other automated and other modern technologies to automate the analysis and classification of data and governance of data in the future. And third, some new intelligence layer that really doesn’t kind of even exist yet. And maybe it. Will I am. This this all all props to Gardner. This is one of their slides that has been distributed. Widely about kind of. A data fabric architecture I. I don’t like it. Although there was a newer version of this that is even. Less definitive and even kind of more squishy. That doesn’t really kind of. Say much. There’s a newer version, new an older version. I like the older version, although the the. Got it. Got it. Got to get with. The design here but. What we’re looking at here, folks, for anybody who’s listening while they’re driving. It is is. A high level architecture of what a data fabric is, and it’s really nothing much to write home about. It’s just basically. A block diagram, like a kind of like the classic layer cake architecture with layers in the bottom. And then you work up the cake. With up up, you know, layers in the. Middle and layers in the top. It’s kind of the classic. Not saying much architecture type diagram. The only thing that I would take away from this is you’ve kind of got data sources on the bottom of a diagram. You’ve got data metadata. You’ve got a data catalog by the way, guys. This is why data catalogs are are kind of like the new black. Went to. I think I went to five industry events last year where every single one of them was dominated by data catalog vendors, most of whom I’ve never. There before so data catalog vendors are popping out of the woodwork, they’re sprouting up all over the place, and I would think it’s largely a function of a lot of focus on data fabric, and then a lot of focus on data mesh, by the way as well. Separate podcast we’ll we’ll dive into the data mesh, but sitting at the bottom of this is a data catalog because to to get your hands around all this metadata that is, that is going to be used to help fuel decisions about all of this automation and data management data. You got to be cataloguing. All that data at the very. Least you need to know where it is and where. It’s coming from. And and and on and on. So a foundational component here, one could argue today, is at least is some form of a data catalog, and that’s fine. You move up, you get. Into some of the advanced technologies that layer for knowledge graphs and other kind of intelligence enriched. It’s semantics knowledge graph. Yeah, with semantics. Sounds like breakfast cereal, enriched with semantics. UM. But a semantic layer plus plus knowledge graph plus data management capabilities plus governance capabilities, that’s what’s not written here, right? Knowledge graph and enriched semantics doesn’t nearly go far enough to describe. What is described here in this diagram from Gartner as what they call a recommendation engine. Again, drastic oversimplification, this isn’t just a recommendation engine. This this is in essence a very sophisticated and very advanced governance solution, not just governance as well, but. And advanced and sophisticated. Operational layer for the organization. You can easily envision a world where these data fabrics are are are telling you know, users of a CRM system, what version of a customer record to use or what you know or are making decisions in real time about what customer record to display. In a CRM or an ERP or anything else so so this these are these, these are analytical and operational and what is on this diagram as a recommendation engine plus #2 here knowledge graph and rich with semantics. These the combination of four and two on this graph guys is really kind of worth calling this intelligence. And then of course. You get into. You know the consumers of the data but but. If you really take the time to dive into this, you’ll come to the conclusion that I did, which is. Wow, this is really conceptual and really theoretical and how I would go from where I am today to something that. Looks like this. I don’t know how I do it and I’m with you guys. I’m not entirely. Sure, how you would. Do it. More information to come, but I think I think This is why there’s so much focus on data catalogs, because that one you can put your hand around. You, you. You. You can kind of grasp that one and. Say OK, I. Know Media is going to be important and I know I’m going to get to a world where I’ll use AI and graph. To to analyze all that data and. I will. I will use the insights of that active metadata to make some really awesome and well informed decisions about how I both manage data and manage my business. I don’t know. I’m going to go from A to B, but I do know that we need to start with the data catalog and that’s why there’s so many, many people trying to sell. If you don’t already have one, So what are some key considerations here? Other things to? Think about this is kind of the the. What is? What is in the data fabric? This is kind of like my slide to help you cut through a lot of the hype that’s going on out there right now on data fabrics. A data fabric is neither centralized nor decentralized. The data is going to determine the most effective architectural patterns. So where do you need centralization? Where do you need to have centralized management of rules? I suspect that that will be happening. In data hubs like it happens today, MDM for example, there is an architectural advantage to replicating small subsets of data within a single data hub in order to allow you to steward, manage, transform, even merge that data doing that. And a spider web type approach is just not efficient, so I suspect MDM and a few other use cases. They will continue to be centralized data management even in a highly evolved form of a data fabric. Right. So there’s a lot of people out there saying, Oh well, data measures decentralized and data fabric is centralized. That means that. Data fabrics are bad. No, no. Data fabric does. Is really kind of agnostic to centralization versus decentralization. I think what it would say is that the data will tell us what architectural patterns, whether this should follow a hub pattern or not, follow a hub pattern. Right. Do we need to even integrate these systems? What’s the most effective ETL required? In order to integrate these systems together, what’s the most effective mapping on and on and on, right? So it’s neither centralized nor decentralized, could be fully centralized, could be fully decentralized. The data is going to inform you and tell you what the best architecture is here. #2 requires really high levels of data, data governance, maturity. The this this is kind of the metaphor here, guys is just just imagine like you’ve got. Let’s imagine me. As a 16 year old, who just. Got his driver’s? OK, then imagine somebody handing me the keys to a McLaren. Outcomes probably aren’t going to be very good, right? That’s kind of how I see. It if Eric. Even if we were able to kind of wave a magic wand and and have this highly evolved data fabric. Which we’re not, but let’s imagine if we were, if I had that capability and that power today and I had no idea like what my baseline data governance policies and procedures were. If I had very low levels of maturity around data and analytics governance, the idea that I could go from zero to 2000 miles an hour just like that. Is really kind of ludicrous. Like you need to have a well buttoned up, well defined, highly effective data governance organization in place for this to work, because there’s no way you’re going to just all of a sudden start turning over stuff to machines. The first thing your users. Are going to ask is why was. This decision made. How are we looking at what we’re? Looking at this is this is something we’re hearing these days in, in conversations related to AI and ML and making it explainable or defensible, or maybe even. Mythical can you can you determine or or say why you’re looking at what you’re looking at, so the data fabric won’t just magically articulate a lot of these rules. These will be. This will be an evolution, right? The data fabric will take what you’re doing today and augment that and improve that and accelerate that. But you need to have that baseline first you. Won’t just be able to skip. To this new world, this fancy, fancy, fancy new car. Without knowing how the car works with the brakes, do what the accelerators do and how to turn left or right you need. To have some. Line of a data and analytics governance organization in place and it needs to be relatively evolved, right? So don’t think for a second that data fabric is just going to make all of your data governance requirements go away. It will not. And if you do, you’re going to be in for some really, really rough times. Number three data fabric, mostly theoretical. It’s mostly theoretical. I called it conceptual. Earlier conceptual, theoretical. It exists on paper mostly. Including the paper you’re looking at the digital paper you’re looking at. Again, I mentioned earlier, the Gardner. 5 to 7% adoption, there’s no way. It just it just. Maybe a handful of companies are tinkering with this. The largest of the largest of the large are tinkering with this and looking at kind of data fabric architectures and are saying that they are working to implement one, but that is very, very different than realizing a ton of business value from the thing that we’ve been talking about for the last 30. So if you go to Gardner and you’re coming back and you’re like, let’s do a data fabric, keep in mind here that I think that even a five year timeline to any form of maturity is incredibly aggressive. I think you’re talking more more likely 10 before we even get to any sort of prime time on. The fabric that assumes it. Doesn’t lose speed and just die in the vine. Which is entirely possible. We’ll talk about that in a little bit in a couple of bullet points. Number four, data fabrics can’t be purchased. Sorry, vendors. You can’t sell anything you. Can’t sell the data? Fabric easy doesn’t exist. I will challenge. Anybody who tells me that they have a fully baked comprehensive, all inclusive data fabric solution that does all of the the, the, the three key capabilities that I was talking about in the last slide or two slides ago, they don’t exist at. Best what you’ve got. Is a collection of capabilities that exist across multiple data management solutions. That could be. In theory, wired together in a way. To enable some data fabric like capabilities. But the idea that this exists in a single solution is laughable. It’s not there and it’s not going to be there for, I would argue a number of years now. There are vendors, particularly data catalog vendors, who are saying they have this. They’re saying, hey, we can sell you a data fabric. Buyer beware. What they’re probably selling you is a data catalog with some pretty cool capabilities when it comes to discovery and profiling of metadata, that’s an important part of this. But it in no way comes even close. To meeting what you will the the fully baked version here of a data fabric which means deeply integrated governance capabilities, both policy management and policy definition and policy enforce. That that doesn’t even exist today. Out there guys like like the ability to define a rule in one system and have it being forced in another MDM can can do that with with some subsets of data. Data quality solutions can kind of do that with other. Subsets of. But what you’re talking? About here is across all data. Across all data management use cases, not just data quality or MDM, but even things like integration. And everybody tells you that tells you I’ve got a data fabric warning. They don’t. They’ve got probably some, maybe some cool functionality, some cool capabilities when it comes to data cataloging, that’s foundational layer that I was showing you before, but they don’t have a fully baked data fabric solution that’s going to. Be years before anybody does. Numero 5 sunk data fabric will start by augmenting. I mentioned this before augmenting traditional data management task, but is ultimately designed for higher levels of automation, right? This is that spectrum I was. Thinking about today human driven with some augmentation a little bit, you’re in there. Smattering of augmentation. But what we’re talking about in the future here is a drastic acceleration, a lot of the augmentation and automation. Sorry, I’m sniffling here guys. Got some springtime allergies. Pardon me. Having some coffee? I love coffee. Number six gates fabric has yet. To find a compelling business rationale. If you disagree, I would love to hear. Your comments below. But a business rationale is not more sophisticated data management. Sorry, not a business rationale. Let’s talk about increased revenue. Significantly decreased costs or decreased business risk or some combination of the three most of the business cases that I’m hearing around data Fabrics Center on #2 reducing the cost of data management itself. That’s not going to be enough here, guys. The the cost that we’re talking about to implement a data fabric, particularly that, that, that, that intelligence. Layer in the middle. Ohh, those costs are gonna be high. You’re talking about storage of more data than you’ve ever stored before. You’re talking about. Mute against massive troves of data in a. Way that probably only. Your data science organization is even considered in the past and that will become widespread. The operational cost here will not be trivial. There will be costs with software, there’s going to be cost, there’s going to be switching costs with all of your legacy platforms that are probably already bought and paid for and. Already deeply integrated. And working in that. Highly traditional kind of data pipeline way, right? So the cost. Here will not be trivial and I would argue that I’ve yet to hear a lot of compelling business cases for the use of data fabric now. They’re there. They’re there, but in classic data management kind of world in our data world, we are so insular and we love to look at, OK, listen, this data management task and it will streamline this integration and it will fix this data quality issue. Can this thing be used to? Significantly accelerate revenues at an organization? I really think it. So don’t be wrong. I I’m I’m I’m being a. Little bit of a skeptic here because it’s kind of my nature, but there there is. I I’m I’m a believer in the data fabrics. Guys don’t, don’t get me wrong, there are massive business cases out there, but I would argue, why are you asking me to rate my experience? Hopefully you didn’t see it on the screen and hopefully this we’re still going. Looks like I’m going longer. I thought that’s my cue to to accelerate because I just got it. Hey, are you are you enjoying our the platform that I’m using to record this? Hopefully we’re still still going. There’s a lot. There is a lot of gold in them there hills. I really, really believe. I think that data fabrics could. Be used to drive major operational benefits within organizations, but today I haven’t heard it. And I haven’t heard nearly enough of it. Data fabrics are both analytical and operational. In our world. We love to look at analytics and think analytics is our sun, our moon, our our universe. It’s our everything but these these data management architectures are inherently both analytical driving insights reports, but also operational. Data fabrics could be used to. Make your supplier onboarding process better. Make your quote to cash process better. Make your. Monthly financial close process. Faster and easier. So these are these are some of the things that. That kind of. You need to think about. From a data fabric perspective, what they are, what they are, well, stop sharing. Nice change. You actually got to look at some graphs instead of looking at me for for 45 minutes, the best data fabric. Let’s, let’s, let’s tie off.

Speaker

Malcolm Hawker

Few key things. Data fabrics don’t exist yet. You can’t go buy one. There is a ton of promise here, but there is between where we are and where we. Need to go. There’s a ton of concept in theory. And really, major roadblock issues that we. Need to overcome frankly in a lot. Of technology that doesn’t even exist yet. This idea of some sort of kind of data operating. System I really like that idea. Some form of data operating system that combines semantic layers and knowledge graphs and AI and ML and governance platforms and integration platforms. All these things it’s CU. Boy, that’s a big that’s a big lift, but there’s. There is really, really. Golden in their hills so. For CIO CDO’s VP of Data analytics. And you’re thinking about this thing, and it looks intriguing and you’ve kind of got caught up in the the shiny object. I don’t blame you. What would I do? Well, what I would do is I would take a look at the building out a road map acknowledging that we are multiple years away from prime time. I would focus on data governance maturity and making sure that I’ve kind of got all of that buttoned up right, kind of standard blocking and tackling what comes to data governance. I would be looking at data cataloging capability, right? I would be if we’re if you’re not yet transitioned. To the cloud. I don’t know how you’re going. You would. Ever, ever be able to to to manage all? These all this data. Outside of a cloud environment, right? So you should be thinking about your. Cloud migration you should be. Thinking about your data measured application strategy, once you get there. Right. And your provider, does your provider have the kind of the tools and infrastructures your provider thinking about these things as well, right? I will again I will be looking at some sort of data catalog but but not for data. Cataloging sake, right. You need to find that business case. Why build a data catalog? What does? It do. And by the way, it’s whatever the answer isn’t. No, it gives me a single place to go look up different data definitions or it gives me one stop shopping for understanding what my data is out there. Those are business rationales. Those are data management rationales. The business rationales are how does it help me make more money or drive down costs of the organization? Right. So think about kind of your strategy around data cataloging, thinking about building out a business case for data catalogs, think about. Data fabric from a. Five year horizon, not a one year, two year three-year horizon. That’s my advice. There’s some goodness here. There’s a lot of excitement. I think data fabrics could be transformational. I actually think they would. Drive more value than a data. Don’t tell anybody I told you that, but we’re a long way. From prime time. All right, that’s it. Data fabrics. Demystified. I’m so glad you could join me here on this 22nd episode of the CDO Matters Podcast. Again, I’m Malcolm Hawker. Maybe next time you see me, I’ll have. New new backgrounds, new neon lights, who knows so? Glad you could. Join please give some comments down below. Please subscribe to the podcast if you haven’t already. I usually have guests on it. Sometimes I do solo events like this as well, so it’s my honor to be giving you this information, sharing what I know with you through three years of Gardner and through a. Almost 30 year career in data and analytics and I really hope that I will see you on the next episode of CDO Matters sometime soon. Thanks all.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.

Waqas Ahmad

Highlights

Interesting fact 1
Interesting fact 2
Interesting fact 3

Experience

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Risus feugiat in ante metus dictum at. Tincidunt arcu non sodales neque. Lacus laoreet non curabitur gravida arcu ac. Etiam erat velit scelerisque in dictum non consectetur. Amet massa vitae tortor condimentum lacinia quis.

Education

Author has a Bachelor's of Wizardry in Defense Against the Dark Arts from Hogwarts School of Witchcraft and Wizardry.

The CDO Matters Podcast Episode 22

Data Fabric Demystified with Malcolm Hawker

Malcolm Hawker

Episode Overview:

Key Moments

Key Takeaways