Episode Overview:
Episode Links & Resources:
Good morning, good afternoon, or good evening, good whatever time it is, wherever you are in this amazing planet of ours. I am Malcolm Hawker, and I’m the host of the CDO Matters podcast.
Thank you for joining us today. Thank you for listening. Thank you for watching on YouTube. However you are consuming the content, I’m thrilled to have you join us today.
Today, we are going to talk all about data architecture, and I am super happy that Pete Cooney has decided to join us today. Pete is the lead architect lead data architect at Jackson. Pete, welcome.
Thank you.
What is what is the old name of Jackson, for for people who may remember it as a as a different name?
Yeah. We our our name historically has been Jackson National Life, and we started a life insurance company. We’re now, one of the leaders in the annuity business.
And then we’re bringing out new insurance based retirement products for folks, all the time. So talk to your financial adviser about us.
So Awesome.
Awesome. Well well, thanks for joining and taking time out of your busy schedule to be here. Thank you. Alright.
Well, let’s get into it. We’re gonna we’re gonna talk about data architecture today, and I I feel like I maybe need to apologize in advance. It is early March as we record this. I’ve just returned from three days at the Gartner Data and Analytics Summit in, in in Orlando.
And I’m I’m I’m a little hazy because I’ve been in just a wash in all things data for three days, not to mention being on my feet and working on our booth and everything. And that and and that was a lot of fun. But I I had a chance to talk to a lot of data practitioners. I had a chance to talk to a lot of data architects.
And and today, I wanna focus, Pete, on just kind of, like, what would a CDO need to know about data architecture? What are some of the things in the space that are kind of going on? We’ll probably, of course, probably have to talk about AI. We’re all we’re all talking about AI.
But let’s just kinda start high level.
How would how would you describe a data architecture role to maybe a nontechnical person?
The way I would frame it is that people put a lot of thought into how they want their applications to work because they understand that applications produce outcomes for people. Right?
What folks don’t often as much think about is how does data, the data that I produce when I do my work, the data that I use to look at to help make decisions as part of my work, and the data that we use behind the scenes in our applications, how does that all play together?
And we we’ve been doing people have been doing data architecture formally for decades. But for a long time, especially with the way the technology was, we thought of data architecture as something that was for developers to help them structure what they needed to do. Then somewhere in the last fifteen years or so, a switch flipped. Right?
CDOs started coming out because we’ve been your chief data officer knows that data is an asset for the business.
Data can be a moneymaker. Right? How many how many of the big organizations like Google and Amazon or those have shown you can make money with data? So data architecture fundamentally means how do I organize and use my data most effectively?
And there are two flavors of that that that are complimentary. One is how is how do you structure that data? So is there a way of organizing my data itself in in context of islands of data in my world or or in a big picture in an enterprise?
And then the other way is what kind of technology do I use to enable maximal use of my data?
To give give give a good example, fifteen years ago, the data warehouse was a revolutionary way of gathering data from whole different lot of systems.
But we we it was beginning to feel a little stale because there was a lot of effort being put into building these data warehouses, and people weren’t necessarily using it for a variety of reasons.
Then at this at that point in time, cloud technology started coming up. You started being able to talk about big data, aggregating data into you know, in in in volume and being able to use more more data in a bigger variety of ways than you were constrained to in a data warehouse.
And, of course, that had its own limitations too because it it it lacked the structure and the security and all the good things that came with the data warehouse.
And and so you start you start seeing different models of of data lakes evolving. Now you’ve got data lake house, which tries to marry both kinds of data consumption.
So in in data architects is ultimately here, in my mind, especially for a regulated organization like Jackson.
A data architect is really here to help steer the organization in a way that’ll get the maximum use out of their data appropriate to their circumstance. Because as you definitely you you could attest to this having just been at the garter. There’s it’s not a one size fit all sort of circumstance. Every organization is a little bit different. We maybe we follow we’ll follow it to patterns, but but every organization has a different take or different need for for the use of their data. And so finding the right fit is really why they would hire somebody like me and not just leave it to a committee of of of folks in technology.
So so there’s there’s a lot of things to to to dive deep deeper on from from your response.
Yeah.
But I was a little struck at the beginning when you were talking about business applications.
Mhmm. Because I think a lot of people would naturally assume, oh, okay. This is analytics.
And, of course, I I mean, you talked about lakes. You talked about warehouses, and, quite obviously, that’s that’s analytics and cool.
That’s what I would assume kind of a data architect is doing is is putting those pieces together. But you did touch on the business applications part at at the beginning, which kind of kind of gets a little bit into process optimization too, I would I would think necessarily. So are you are you playing a role in in in helping consult folks maybe on more of the engineering side side of the house and and building those applications as to kind of, like, what data they should be capturing or how they should be configuring their applications in order to provide the right data that the organization needs?
Absolutely. In fact, one of my favorite things to talk about so I’m glad you asked this question. One of my favorite things to talk about is the need to have the data that your systems consume and share be consistent with the data being used in analytics. It’s a it’s a point of passion for me because I I I I didn’t get my start as a data warehousing guy.
I got my start as a data in the data integration guy. Enterprise application enterprise, data hubs and ESBs and and all of that. And and so I cared very much about how applications exchange data and about practical use of data day to day. And I learned how to do data modeling because of that kind of thing.
And so I I come at it from a different angle than a traditional via warehousing analytics support kind of person.
And and so I think it is it’s really critical to have a capability like an MDM governing certain key entities of your data so that you can then share that data consistently, whether it’s through the event based operational data store that that that has APIs on top of it, whether it’s a lake house where you have data products published and shared in your data your data catalog.
I want those the data that going through those those different routes have the the same level of consistency. I need to know that the customer record that I’m using in my API is the same customer record that my my my customer service team is is looking at in their CRM.
There’s the and it’s the same that the, you know, the the sales analytics teams are looking at as they’re planning things. You get you get the idea.
Well, I would suspect that sometimes you can you can engineer that by by design. If you’re building, like, a custom application, perhaps. Right? Like, from the ground up.
Like, if you actually have an in software engineering team and you’re building stuff from the ground up, I would hope that you were at the table and making sure that the discrepancies potentially that that would might exist between something that you’re building from the application side aren’t that drastic from what you would see on the analytics side, but I think some of these things are necessarily unavoidable given most of what we use, I suspect, is, you know, out of the box. Right? It’s it’s Salesforce. It’s it’s ServiceNow.
It’s all of these things.
And it’s all these business applications that we’re buying from vendors where, of course, because there’s no standards around metadata, all these things necessarily have different definitions. And I guess I guess that’s what you mean or different structures or different quality metrics or or all sorts of differences. And I guess that’s what you mean by the kind of consistency across channels and the and the need to to resolve those.
Where where’s the best place to resolve those differences? I mean, there’s a lot of different ways to peel that onion. You recommend you you talked about ESPs, enterprise service bosses.
Usually use that phrase much anymore, do we?
No. No. And there’s a reason for that. The the the we were very well intentioned creating ESPs, but there’s there’s some flaws in that.
That’s a whole another Okay.
Okay. Well, we won’t go down that rabbit hole maybe just yet. Yeah. But you can kinda solve for the consistency issues in in an ETL. Yeah. Right? And I and and I’ve done that where whereas a part of, you know, source, target, do some magic in the middle.
But increasingly in some of these these these lakehouse driven architectures or or data lake driven architectures, a lot of those a lot of that data management is is happening a little bit downstream. Do you agree?
Upstream and downstream. Yeah. Okay. And and, again, especially depending if whether you’re talking fabric or lake or then fabric or mesh. Right? Mesh, they’re assuming it’s happening upstream.
Oh, okay. Okay.
Right? And so that to to get into the mesh, you’re gonna have to have a common frame. But in fact, with in in a fabric architecture, you then you get into more of a a situation where you can enforce some of those rules.
And I I’ve I’ve I’ve worked at some very large organizations where I they’re basically tailor made for mesh where everybody needs to know what their marching orders are, their basic rules to get into the common ground, and then you publish it out to the common ground. Versus, more my situation now where I I we we it was more of a fabric approach where we we have some certain central kind of approaches to things that we want to use, but then we scale them so that the different functions of our business can use those effectively. Right? And that’s I think that’s the the key for us is where that if you want that management to happen at the point of exchange. In other words, if you’re using a lake house, ask yourself the question, what’s gonna feed the consumption of of my of my data products in my lake house?
And if what’s gonna feed those key those key data products isn’t married with your key your key APIs or your your key data sources. That’s what you want against some to bring an expert team together and say, how do we make these things married together? So the the we we it doesn’t have to be everything. Matter of fact, it’s usually a a minority of stuff. But how does the key stuff get tied together in a consistent way so that we can share the right data for the right job and have a common view?
Well, so you you just made a business case for MDM.
Because because not all this stuff, but the key stuff, and I use those words all the time, by the way. But the key stuff is the stuff that’s used most often, I would I would suspect, or is at least widely shared. That’s where it’s most important to to resolve some of those common discrepancies in data. Do you agree?
It’s it’s it’s definitely one way of doing it. I I have tried to I I’ve tried to handle this more brute force, in the past. And, again, and and that and it worked. It was just a lot more work.
Combining date combining data into common, you know, into a common file structure, for example, in a data lake. It’s a great idea.
Very, very, very labor intensive.
Hard to get people on board and and inconsistent. So yeah. And that’s when MDM has historically done well. And and and I have to say, the modern MDM, the modern cloud based MDM is a much better place to deliver value in in in that front than you would run into in, traditionally.
Because I think that the big difference with the cloud based MDM is that the two you know, cloud based MDM leverages the cloud effectively and and it does has the cloud the cloud components like the data lake storage do all this heavy lifting, as it were, for some of the components that were really weighty and and hard to manage in, you know, ten, fifteen years ago. So it’s it’s it’s it’s better option than than it was. That’s that’s for sure. I wouldn’t think of my favorite thing about the MDM though is the business ownership.
That’s really the the main thing for me. We’re also I’m a big fan of business driven data governance. Jackson has business driven data governance with beta with business leaders and their teams owning data, having a sense of ownership, and having some some responsibility, not just accountability, but responsibility for some of some of the, the management of their data. And at the end of the day, you want those people to have hands ons resolving issues versus some person some well intentioned person in technology, you know, sitting there and making their best guess.
Business analyst is no no substitute for a business me It is is a one way of framing. And and and the business me is gonna have hands on in, use using a lot of MDM tools, and that’s that’s one real strength in my mind.
So I sorry. I’m I’m I’m a sucker for an MDM use case. So I’m I’m Okay.
I may in many ways, some might argue that I’m an MDM hammer looking for MDM nails, and you and you and you just handed me one. So, I I I could I couldn’t I couldn’t resist. But I did find you had opened your your previous statement with something I find slightly paradoxical, and I’d and I’d I’d I’d love for you to explain it that that that paradox, which was we were talking about discrepancies and data, differences between what existed in in a business application Yep. Or an operational system and an analytical system. And you said, I’ve tried brute force, and it’s harder.
That that to me came across as paradoxical because I would assume, oh, brute force, one table to rule them all.
Technologically, that sounds easier, but you’re suggesting that to execute it is actually harder. Why?
To to to make a a a brute force approach work well, you have to be able to scale the software pieces of it. You have to have excellent documentation.
You have to have excellent knowledge transfer.
We have to have effective architecture to govern the whole thing. And I and I personally feel like, you know, our our architecture was quite good. And we did we did we did as good a job with the number of people we had kind of maintaining the number of maintaining standards.
But when you’re when you’re working with development teams, in most organizations I’ve been a part of, and I’ve consulted for a long time before I I’ve I’ve joined Fortune five hundred companies the last, you know, ten ten, fifteen years.
The most important thing you have to remember about developers is developers know what they know. People know what they know, and you’re not gonna be able to change their mindset unless unless you give them significant incentives, financial, career, some real incentives. Let’s see. We’ll give somebody a real incentive. They’re gonna stick to what they know to deliver what they’re supposed to deliver.
And developers intend to like to build things and to learn things. Yep. So if you give them a framework that’s hard to understand or a a lot of work without apparent reward, and and they’re not adequately, rewarded in some other way for that, then adoption is just hard to deliver. And then that’s really what I see is that the the people with the best intentions and excellent structures and well thought out designs, some of the the message, you know, could fall flat because they were missing on any one of those areas.
And, you know, Jackson, we pride ourselves on delivery. We we deliver high quality solutions to our, to our customers, and we we pride ourselves on innovating for our business within our four walls very, very well. And if I went to somebody and said, hope you have to stop everything you’re doing. Stop delivering value now and follow this path, and I’m gonna run into resistance, and I may not get the adoption that I want.
So it’s interesting. You you just touched on something that I’ve that I’ve always known to be true. But when we talk about kind of top down heavy handed approaches, I’ve always said don’t do them because they probably don’t best fit your business need for one. Right? Because the way that sales looks at something is necessarily different than the way that finance looks at something. And putting that aside, kind of the top down edict driven things are really hard from a governance perspective on the on the business side to get those people who are on the business side to agree on those common definitions is awfully hard. But what you added was a very interesting and I think totally appropriate perspective, which is who wants to be told from an engineering perspective, whether that’s a data engineer or software engineer, who wants to be told how to do their job?
Right? No. Nobody does. And and you you actually eloquently kind of woven the idea of excellence in delivery, and delivery becomes excellent when you are the best software engineer or the the best data engineer or the best anything, and you and you are being creative and you’re solving hard problems.
But if you’re just sitting there taking orders and and putting every cramming everything into the one single customer table to rule them all, well, I mean, you know, you don’t need a lot of creativity to do that. You don’t need a lot of engineering acumen to do it either. So I’d that’s that’s that’s an aspect I hadn’t necessarily it it makes complete sense. I just always think it’s hard to do from a governance perspective because you’re forcing everybody to agree. But what you’re saying is that it’s gonna suck the life out of your engineers because you’re telling them how to do their job.
Yeah. And and and I and I and and I think that actually a central edict, it was not treated as an edict. And I think that’s the let me drill on that for a second. A central initiative can work.
Right. Okay. Yep. True.
Number one, you get your creative people excited about. If you’re talented people, especially your most talented people, but if you have a bunch of talented people, if you get them excited about an option and you get them diving into it, then you can get a lot of value out of it. And and I because, again, the the successful kind of fabric style made of fabric architectures I’ve seen, really, people got excited about it. People became invested in it, and they and they built a framework that worked. Where where where I’ve seen it fail is where somebody, you know, in in mid management or or developers didn’t catch the vision.
And it it so all of this work was felt like a slog. It’s it’s a difference between taking, I have a bunch of kids. And so and then the when I and I have a teenager, of which I I’ve had a bunch now.
And and I say to my teenager, I want you to rake the lawn.
They’ll go, oh, no. Oh, wow. Woe is me. You know? And if I say, I want you to to to to rake the lawn so that we can then have your friends over and have a party.
Suddenly, they’re incented.
Right? But and and so from my standpoint, it’s it’s about motivation. And I I just it’s it’s whether whether it’s just a little bit of a do unto others kind of moment. Yeah.
If if I want to do something, I wanna be excited about the direction whenever possible. And we don’t always get that luxury. But especially being in architecture, part of my job, I feel, is to get people invested in a direction and help them understand the benefit of that direction. Because if people understand why and and and see how it ties back to their benefit, the benefit of their organization, peep people will get excited, and and I see that time and again.
The other the other danger though in in my space especially is that sometimes these things can take a while, and then fatigue sets in. And maintaining that level of excitement or maintaining consistent direction is, the really hard part. That’s where data warehouses tended to founder because you get about sixty percent of the way into populating that that Uber data warehouse, and everybody just got sick of it. And they they wanted to pivot to a data lake because all of a sudden that sounded cooler.
Oh, all of a sudden, oh, wait. Now we’re gonna stop putting data in the data lake, and we’re gonna we’re gonna do a combination data warehouse and data lake. Oh, but wait. But then they still are neglecting the twenty percent of data that they netted they didn’t put into either.
You see what I mean? So but that is, by the way, what part of the genius of the lake house is that you can ultimately solve that problem because if you can get all of those pieces into a spot where at least they can be shared somehow in the lake house, you can you can combine them together and use them. Still doesn’t necessarily solve the consistency problem, but, again, that’s that’s that’s part of the the challenge of it.
So so a few a few things.
One, I’d love to dive into the difference between a warehouse and a lake just just briefly. So but but but but let’s park that.
Two, what I heard you say this gets back to my first question of, you know, describe a data architect. What I just heard you say in my language is you need to actually be pretty good at persuasion and pretty good at sales.
And, yes, you need to find what incentivizes people, what motivates them. And in the case of of your teenager, it was the party.
But you also need to be thinking bigger picture in terms of things like scalability and future proofing where you need to go. So it can’t necessarily always be just about, you know, persuading people to do something. It’s also persuading them to do something which may not be in their best interest for today, but may be for their future best interest. So there’s certainly some salesmanship in involved in all that. Do you agree?
Well, I and I think I think it’s under it’s it’s not emphasized enough that part of being in leadership in in the corporate life is salesmanship.
Yeah. My my I would give credit to my dad. My my my dad highlighted how important sales was to me at a young age, which is why I I ended up taking jobs in college, like going and passing out petitions, getting petition signatures in mall parking lots and things like that, and making much better money than I would have at Taco Bell or something like that.
And and and, ultimately, persuasion is a key aspect of corporate light. And the more responsibility you have, the more the the more persuasion is important.
And I think that I I yeah. If if we were doing a podcast for for architects, I would I would we would spend the entire time selling the value of of, persuasion because it too many too many good developers become architects and then wonder why they struggle. And it’s because, ultimately, they they want to give people instructions.
Yeah. But not give them incentives. And so that that’s that’s where the balance comes in. And and and going back to the analogy of the teenager, part of the reason that I I I might want my my teenager to have the party in my yard to guarantee that there’s no funny business and that the right friends are coming and that this person’s hanging out with the right friends. Right? Fortunately, my kids tend to have good taste in friends, so I don’t have to worry about that. But at the end of the day, it’s I, it’s one of those things where you, as an architect, want the people to pick things for the right reasons.
And you don’t have to be the the bad guy going and saying, oh, you can’t do this. No. No.
You have to use this other thing. That that’s not gonna versus going and saying, well, the trouble with this is that it’s insecure.
Yeah. And that there’s an alternative way that also, by the way, would be, like, twenty five percent faster for you to to develop with if you use this alternative over here, which kinda leads us to to data warehouse versus data lake in in in one sense.
Data warehousing is ultimately about structure and the consumption in my mind. Data lakes natively are bad for structure and consumption because data lakes are for aggregation and data use in volume.
If you’re doing AI, you need a data lake of some kind.
Even a lake house is insufficient for good good good AI pieces unless you’ve got a really good at RAG architecture. That’s a topic for another time. But and I don’t know that a really good RAG architecture exists yet, by the way. So we’ll we’ll we’ll we’ll talk to you in five years.
But at the end of the day, you’re gonna need both are important aspects of corporate life now. And that’s why the lakehouse is a genius idea because it allows you to, through a virtualization layer, access both components and and have the right tools for the job. So the the the lake ultimately is a must have for a lot of organizations because you need to aggregate data without too much forethought or control over the structure of it. You wanna have access to more data.
And then that’s what lake the the really, what what I’m seeing is that lakes become sources for data warehouses.
And the data warehouses then become you you can have a few more than you used to. You don’t necessarily have one big enterprise data warehouse as much as you have fit for purpose data warehouses, but then point data in a nice structured way to people who need it for more specific things. And even if you have a nice reusable data product, you can still tell people you should use this product for this, this, and this, and not use it for this and this. That’s a concept that didn’t exist fifteen years ago with the data warehouse.
With data warehouse, we’re like, we’re gonna build a data warehouse that everybody’s gonna use. It’s gonna be great. And then we real now now we realize not every like you like you said, sales and service and and, accounting don’t have the same view of the world, and you they shouldn’t. So, therefore, we can build views of the same data in a way that they can use optimally, and the data lake becomes kind of a a a a key way that you can aggregate and make that happen.
So so you had said something you you should use when you’re talking about a data product, you should use it for this, this, but not this.
That to me, that is a data contract. It may be. Terms of use. Right? Yep.
We could be enforced in some ID some ID of a data con, data contract. Yep. Let me let me try a try something on you and see if it if if it resonates. And it’s and if the answer is no, I’ll I’ll totally accept that.
I I I’m wrong often. Just just ask my wife.
A data warehouse supports a situation where you know the questions you’re gonna ask, and a data lake is where you don’t necessarily know the questions you’re gonna ask. How does that how does that fit? What do what do you think about that?
So is that is that a useful tool to think about the differences between the two?
Yes and no.
Okay. We’ll take it. That’s a win.
Yeah. Yeah.
Well, I would frame it a little bit differently. I would say data warehousing data warehouses are where you would go to answer questions.
Yep.
Data lakes are where you go to learn things.
Including the questions, potentially. Including the questions, especially. Okay. Yeah.
Okay. Especially having did having spent some time in around AI over the last ten years, I tell you, it it it is necessary to have a volume of data to actually learn things.
In in delivering an enterprise data model, which is something I’ve done in recent years, which is to say a normalized model for an enterprise, you you need a data lake because you need a lot of different points to be able to to develop something effective, which is counterintuitive to people who’ve done data warehousing, but it’s true. So and to learn about kind of the different ways that an organization functions, you you can learn a lot about an organization from how they view key entities.
Right? If I’m if I’m doing sales, but I never refer to a a a a person, I’m always talking about accounts or I’m talking about SKUs.
Do I you know, where do people actually fit into my the equation of that front?
But but by contrast, if I’m I’m a manufacturer and I’m talking about my product, but I but I I don’t have explicit references to my supply chain partners, where do they fit into the picture? Right? And and so looking at data in a data lake context helps you understand where my organization thinks and doesn’t think. And it’s not necessarily good or bad. Right? It’s but it but it’s safe awareness in a lot of ways is something I I think you can get from a data lake.
Whereas data warehouse, you you have to have an understanding of the domain that you’re going into so you can optimize what you’re seeing what what people can see so they can read it faster because there’s nothing worse than a for business user than trying to get some data and then having to hit a button and wait fifteen minutes. Right? That is just nightmarish in this day and age when they they can go to their phone and ask chat GPT and get something that resembles an answer, right, to a question, and and then have to wait for their internal systems to to process while they go through another pot of coffee. It’s it it it that kind of user experience driven world is what why we end up with different tools for the job in a lot of circumstances. So it’s the same reason I have a a socket wrench and and a ratchet at home because I can I can remove a bolt with one or the other, but different circumstances warrant different uses of a different tool?
Got it. You had mentioned previously in a couple of other responses, you you’d mentioned data products.
So where do those fit from the perspective of this idea of a medallion architecture? So take take a moment to describe a medallion architecture and and where data products kind of fit into that and why, kind of as a CDO, I I should be aware of this this phenomenon? Well, I’ll answer the question with with a caveat.
Okay. So medallion architecture is essentially the situation where you have you take your data from various places in the format that it came in, in all of the various different ways that they they’re represented in a bronze layer. You clean it up and and and perform some some quality improvement in in in this middle layer, which is silver layer, and then you share it with people in the gold layer. And you can even talk about bronze, silver, gold, platinum, type arrangements where you’ve got one kind of general layer of of shareable things, and then you’ve got finer grained levels of of things in the platinum layer.
I think data products optimally live in the platinum layer, and I and I think they’re managed through data governance portals, data catalogs, data marketplaces, places like that, where that curated data, that that that gold data is made available to people in a way that’s secure and optimized for their consumption.
And that’s where I really like where, you know, places like Snowflake, Databricks, Microsoft with with their fabric are going with with data technology now because they’re taking basically gold layers and making them super easily reusable by people.
And that’s that to me is where the data product comes in. But the but the the other the other aspect of the data product apart from its consumability, its accessibility and consumability is the security and governance of it. As a data steward in an organization, and I believe very strongly, again, like you said, business data stewards. As a data steward, I wanna make sure that this person in this department has access to my data and this person in this department does not.
Or even better, person in person a gets access to a view of my data that’s the complete complete view of this data, whereas person b has columns hashed out. So they can get use of most of it or the relevant data using the same dataset. But because of security, they have the stuff that they don’t need don’t need to know massed, Social Security numbers, you know, proprietary information, that kind of thing. So I think I think of data products as it was it was a great concept for a while. It’s now realizable through data governance tools. Mhmm.
Data governance process. And I think that’s the really exciting thing about that space to me is that you can then you can take what we’ve always had to brute force.
Let’s say, ten years ago to do data governance like this, we we would had to brute force it unless you happen to have an NPM, which was getting a lot more work ten years ago.
Unless you had it, you know, something like that, you had to brute force your way and and for kinda create a process around things. Now there are a lot of vendors who can give you that governance process out of the box and and then apply it against whatever kind of data structure you have, whether it’s medallion data architecture, whether it’s a series of legacy data warehouses.
It’ll allow you to give you control over that story and and be able to curate data products and and share them with people in the same way that that people who sell products for a living do it, which is to say you have a package, you have a contract, and you have contents.
Right? That packaging and the ability to get to what you want is really key.
Knowing what you’re you’re you’re accountable to do and not to do with it, that contract, is key. And then the ability to consume it the way we want is key. And that’s where these these tools are you know, offer a lot more opportunity than we had even five years ago.
Awesome. Let’s let’s transition, you know, to to the favorite topic of the day, of course, AI. We we’ve talked a lot of about a lot of things in the last thirty minutes. We we talked about kind of data integration and and and data pipelines, although we didn’t go into depth and into data pipelines. We talked about analytical uses of data. We talked a little bit about operational uses of data. We talked a little bit, about this persuasion factor of of good architecture, warehouses, lakes.
When you think about where we got to now and and kind of let’s just call them pre twenty twenty two, pre Chatt GPT, like, the the world as we know it from an architecture and data management, data governance perspective, What do you see as some of the bigger changes, either that have already happened or that maybe need to happen through the lens of what you do, data architecture, to to be better positioned to to support AI? Because I I don’t I don’t know many people who who think it’s just gonna go away.
I don’t think it’s going away.
So what what do you think, you know, what what do you what is gonna change? What is net different about your role and about data architecture evolving over the next few years in order to best support AI?
Well, first of all, the the the the the the criticality having availability of data so that you can do something with your with your your favorite LLM of the month or your your your favorite, you know, packaged solution of the month is key.
So much of what we’re doing with assistive AI right now is take dataset, point AI AI dataset, make dataset easier to use or more consumable, and that’s a great starting point.
And that’s that’s spurred a lot of people to explore more explore medallion architectures so you can expose data for things. It’s it’s it’s where lake houses have have come into play, I think, a little bit more, though. The virtualization the use of virtualization and an AI are sort of in parallel tracks, and I see that converging at some point in time because for a variety of reasons. I I think the most important thing we’re gonna see in as AI agents come into play is the need to actually build pipelines to move data quickly from from from big place to big place.
I think what we’re gonna one thing that AI is going to drive is cross lake or cross lake house or cross, kinda, cross environmental sharing because the premise of RAG architecture at a high level, right, and this this without getting into any of the gorp behind it and that I in fact, some of the gorp is very interesting, but without getting into any of that is a premise that this AI bot up top needs buddies to help it make a decision.
And when you start getting into agents, agents are the coordinator of other are literally the coordinators of other bots. And those bots have to have data behind them to do what they need to do, but then they need to pass things back and forth effectively.
And we have the means to do that today.
What we don’t have are as many common patterns of how to do so effectively.
And, again, this is where I’m a big proponent for consistency in data because if those those back end sources have some level of consistency and some shared some shared rules and some even, like, say, some shared records, then it’s ease it’s feasible to pass from place to place to place and pass those results along seamlessly between your within your agents. If you don’t, then you’re gonna run into delays in process.
The the agent just simply breaking down and not being able to finish what it does because you ran into data consistency or data quality problems at one point in in this, and that prevents it from being sent to another place which has rules in place or has some sort of constraint that prevents you from consuming the the input from this other place.
So this that’s what where I think some of these these challenges are gonna come into play.
I think people are gonna software application developers are gonna get a try and get around this by building that integration in through their tool. That’s always the first instinct. Right? If you’re the big if you’re the big box, if you’re Salesforce or, you know, SAP or somebody, you’re gonna try and drag it all in and have it all in your space.
But that’s that’s that’s a that’s a solution for a certain percentage of the population for which that makes sense. But the rest of us who can’t afford that or don’t need that, we’re gonna we’re gonna start seeing a lot of pushing the envelope on how do I make that data consistently run across these different pieces of things. Again, I I I’m I’m I’m I rough I kinda tongue in cheek call it multi lake, but but it may not really be multi lake. It may maybe these two different applications running with you know, Dataverse on the back end and these these two applications running with some some Amazon back end and then this this data lake over here all being able to communicate with each other effectively.
We have the means. Not as many people have built it yet, but we’re going to see that because of the agents, I think.
I just I had an moment listening to you.
I I fully and completely agree.
But, historically, at least over the past couple of years, I’ve been a little bit of a curmudgeon when it comes to data contracts and when it comes to to data products because I’m a data product or I’m a product management. I’m, like, recovering product management person.
And and I kinda I I naturally shift. Right? Right? And I naturally shift towards the end user, the consumer, the person, the human that is consuming the dashboard or interfacing with the API or or looking at a spreadsheet, whatever it is. I kinda lean that way, and I still think that from a operating model perspective, having the end consumer in mind is important. Yep. But I just had an uh-huh, Pete.
What if the end consumer is a machine? Mhmm.
And if the end consumer is a machine, well, then a metadata driven contract makes complete sense. Right? Like, when you and I download a piece of an application on our phones and there is the do you agree? Like, oh, give me a break.
Of course, I agree. I just want it I just want the app. Right? Like, there’s always those contracts that are they’re their terms of service.
Here’s you you here’s how you’re supposed to use this app. Do you agree, not agree? Yes. Of course.
I agree. Click click click. And I just ignore the contracts. I’ve I when when it comes to this kind of stuff.
So I’ve I’ve found the concept of kind of data contracts from a human perspective, for lack of a better word, silly. I don’t I don’t know a better word. Superfluous, unnecessary. I don’t know.
But machine to machine and in the context that you just shared, which is lake to lake, if you think about that across companies Mhmm.
That gets really interesting, particularly if we evolve to a place where legal cases, court cases, even arguably lawmakers get to the point of finding a way to protect intellectual property that is buried in data Yep. Where it’s no longer just wild west and anybody using anything and anybody using anything in their models and where companies can actually kinda find ways to protect their data, it’ll probably be in their best interest to monetize it, right, somehow, some way. Today, it’s through these, like, you know, New York Times, a licensing content to chat GPT and, you know, the the the entire horse is let out of the barn.
But if you could set up kind of marketplaces machine to machine marketplaces built on contracts that are traversing, like, by, like, what you described, which is the is the interoperability is maybe another way to say it.
Yep.
Right? The commonality across the data. At the very least, understanding what the what the rules were used to create it, governance policies used to create it. That that I had an moment there. I never I never thought of it before having this conversation about the importance of machine to machine communications and how you’re gonna need contracts. I just thought contracts were not not that viable from a person perspective.
Okay. And and that that’s why when you talk about data products, to the person facing data products, I I I actually like to use a distinction of data product and data service. Mhmm. The person will buy a product, and and there’s an agreement.
A person will will will agree to will agree to something. That’s an agreement. You call it a contract, people will tend to be put off a little bit. Yeah.
But if it’s you call it an agreement, well, okay. Yeah. I’m I get to agree. See?
And so that so that’s where that’s why I tend to frame data agreements for people, again, consumers, data contracts for machines. Because a machine wants to know, I have I have an a necessary exchange. And so that’s and and that I actually don’t we we don’t really even talk about data contracts internally here because, ultimately, we want people to agree to the governing rules and to to agree to use you know, agree to take on a particular role security wise to use the data product that that that they want and and and frame it that way so that people people understand we’re empowering them and they’re agreeing it’s it’s a mutually agreed upon relationship where we provide them, you know, assurances of the security of what what they’re doing and access to what they’re doing in return for them, not misusing it.
Alright. I’m I’m gonna need to go think for about a couple hours because, now now my my brain my brain here is all of a sudden I’ve been now I need to maybe be more open to data contracts because I’ve I’ve been a little bit of a naysayer when it comes to contracts and and and products. I’ll be honest. Like, the shifting the shift left data products.
But when it’s all machines that are that are kinda doing all this stuff, I mean, I totally get it. That’s the scalability that you would that you would absolutely positively need. So which is the heart of data architecture. So, Pete, this has been a fantastic conversation. Thank you so much for carving an hour out of your day.
Awesome. I’m gonna be thinking for the next two hours because of this conversation.
Good. Glad glad to spend the time with you. Good to see you.
Awesome. Awesome. Alright. If you are still with us, if you’re still listening, thank you so much. I would be absolutely thrilled if you would take the time to thumbs up or click like or subscribe or do all the things that we’re supposed to do on social media.
When do you like the content? Because I would like to get some feedback that we’re doing some of the right things.
We keep doing this. We’re gonna keep creating this content because it’s my mission in life to extend the tenures of CDOs and data leaders across the world. So I hope you got some value from today’s conversations. We’ll be back in another two weeks.
Oh, by the way, Pete did mention data virtualization a few times. There’s an episode dropping today, March sixth, with Alberto Pan. He is the CTO of a company called Denodo. Denodo does data virtualization.
So Pete mentioned it. If you wanna go a little deeper on data virtualization, what that means, check out the episode that is available today, March six from, Alberto Pan with Alberto Pan from Denodo. You may find that interesting. For now, I will let you go.
Thank you for listening. Thank you for subscribing, and we will see you on another episode of CDO Matters sometime very soon. Thanks, all. Bye for now.
ABOUT THE SHOW
