Episode Overview:
Data Engineers play a critical role in enabling the success of any data and analytics function. In this episode of the CDO Matters Podcast, Malcolm interviews Joe Reis, the co-author of the bestselling O’Reilly book, “Fundamentals of Data Engineering”. In the discussion, Joe gives a sneak peek into the minds of data engineers and what makes them tick. Data leaders, particularly those with less technical backgrounds, should find this discussion a helpful tool to improve their relationships with the people critical to the success of their data and analytics operation.
Episode Links & Resources:
There we go. We are now live on stage. Good morning, evening, afternoon, whatever time it is you are, wherever you are. I’m Malcolm Hawker head of the head of data strategy with Profisee software and host of the CDO Matters podcast. Joe, you and I were just backstage talking of your, your, your, traveling days here. My guest, happy to have Mr. Joe Reis with us today.
Talk about all things data engineering. I’m I’m gonna pick the brain of of a data engineer, recovering data engineer from from your from your tag line. Recovery data scientists. I guess, but now recovering data engineer too probably, but, Yeah.
We can talk more about that for your audience. That’s a is that twenty four steps then? Not just twelve. It’s it’s like double.
If you’re if you’re, like, a recovering data scientist at a data engineer. Sounds sounds like a lot of, a lot of work. Probably like forty eight steps because you have to go back.
I can forward. Yeah. So awesome. Awesome. Yeah. We were just chatting backstage about, all of our, world travelers, world travel.
Joe, just a little bit about you. I mean, I I I could give your own bio. I mean, the the the the headline here is co author of the best selling O’Reilly Book Fundamentals of Data Engineering.
Now right behind you, see with with the bird on it.
I have I I truth in advertising. I have not read the book. I’ve read the preface preface or preface? I I never know if I’m saying it. I’ll say prefaced. Yeah.
Preface. Yeah. It depends. I suppose there’s somebody out there that says preface.
Well, I’m Canadian, and we have we have a tendency to do those things like project and produce and pre and yeah. So I but Well, hold it against you. It’s fine. Thank you.
Thank you. I I have a US passport now, so I’m I’m I’m American as well. So maybe that means I’m I’m a recovery Canadian. I I don’t know.
I still have my Canadian passport. So I’m I never given that thing up if I ever need a a brain transplant.
Yeah, I’m I’m going back to Canada. Good place. I like Canada.
Yeah. There’s a lot there’s a lot to like, for sure. But with all things around the world, there’s there’s a lot of things to be concerned about as well. But anyway, sir, your book. Tell us more about why you wrote it. I I don’t I don’t wanna tell the readers, our our listeners why you did, but tell us why you you what about writing about data engineering?
I mean, it it’s I I think to back up you know, data science, you know, machine learning for the past.
Yeah.
Let’s see at least it’s the early twenty is right. It’s been increasingly popular, especially with the rise of AI right now, and I think all the hype around that.
You know, there’s a lot of temptations that wanna get into, machine learning and AI. I think rightfully so. It’s there there’s a lot of, productivity and gains and really cool stuff you can do with it. At the same time, what I noticed from day one, and I’ve been working in machine learning for for a long time at this point, You know, the, I think the desire to get into AI machine learning, hire a bunch of data scientists came at the expense of really not setting a a foundation for the success with data science and, you know, machine learning and AI and so forth. And that foundation is really built upon data engineering. You know, I I learned this the hard way. That’s when my, tag line on LinkedIn is recovering a data scientist.
Kind of friends, I jokingly came up with that term back. I think we called us as reform data scientist. As far back as twenty fourteen, twenty fifteen. But even before data size was cool.
Yeah. I think it was right on the cost of becoming cool. But even then, I think we we noticed that it was, like, you know, the, there was even a lot of hype back then, you know, and then we just, I think, after having experienced, I think a lot of just the school of hard knocks along the way and and whatnot. That’s, because a title that’s stuck to this day, and now it’s become somewhat of a meme, but, know, there there’s more of us now out there.
I noticed, the recovery data scientist tagline is, often used. And, so it’s it’s cool to, you know, see that. But, yeah, that’s more of a tongue in joke, but back to the book, you know, it was really written, I think, as a response to the popularity and the rise of data engineering, you know, along the way, companies were starting to realize it. Setting a foundation was critical and data engineers were there to facilitate it.
And, you know, there there were some great books. I would say like designing data intensive applications. I don’t know. Riley, but Martin Kleppman still a great technical book.
But that was written in twenty seventeen, you know, and I felt like one there needed to be a prequel to that book. That book is very much for software engineers, and it’s about distributed systems, you know, and certainly helpful. But, you know, increasingly, since twenty seventeen, there there have been a lot of abstraction along the way. And and tooling and in practices.
And so we really wanted to capture sort of the, I I guess coupling this with the the zeitgeist at the moment, data engineering and and popular practices, but also coming up with a book that was also gonna stand the test of time by of time, I mean, five to ten years, which for a tech book is actually quite, you know, bit of longevity. And so that was two motivations. And wrote a book that was, I would say I would argue is very, technology agnostic, vendor agnostic, platform agnostic, and so forth. We didn’t want to write a book about city data engineering on AWS we really wanted to get the underlying, practices and, you know, mental frameworks around data engineering.
And I like to think we accomplished that. So So so it is generally for a slightly more technical audience though. Right? So if I if if I’m a CTO, would you be recommending that book to me or probably somebody that works for me. Right?
I think both, actually. We wrote the first, four chapters of the book really to be accessible by anybody. It’s not full of jargon. It’s, I think really much, and it’s four chapters.
Right? So four chapters part one of the book. Really good towards anybody. Like, my wife read it.
She understood it. She’s not technical at all. She works in finance. She got it. Then, you know, part two starts jumping off into, I think, more technical aspects of it, but the the the whole notion is to make I I think my problem with a lot of textbooks is they they tend to be very, unapproachable.
And and I think opaque almost by design.
Right? And and I I feel like if you really understand your subject and you really understand what your audience is trying to get out of a book for the most part, they want something that’s approachable, easy to read, easy to understand.
You know, and and so that’s at least, especially the book like this, which is meant to be, I think, accessible to a large number of people, you know, considered a win that, you know, a nontechnical person, you know, can can pick it up and understand it. So Well, shame on me. That means I need I have some reading to do. Sure.
Go read it. It’s good. On on the bright on the bright side, I do I’m I’m a big Kindle guy. And Oh, cool.
Absolutely. We’ll throw it on the list. I’ll send you I’ll send your kindle for you. So Okay. Thank you. That’d be that would be, that’d be awesome.
You’re you’re you’re number two in the queue behind Bill Shmarzo. I I pro I promise nice that I would read his book on AI.
But, yeah, look forward to consuming more. If if I’m if I’m a new CTO, and there’s a lot of new CTOs out there. I was at a studio conference a couple weeks ago in Boston, and I was overwhelmed by how many new studios are are out there particularly.
Oh, yeah. Yeah. Particularly in the federal space where There are federal departments and agencies who are mandating the role must exist. Really?
Oh, yeah. Oh, yeah. So you’ve got a lot of people that are new to the role who may be not late career, maybe early or mid career, who who don’t have a ton of experience, leading, leading technical teams. What are the, like, three or four things that as, let’s say, I’m a relatively new CTO that I’d and then I came from a business background.
Maybe I came out of sales or maybe I came out of marketing, and I’ve been tasked to go deliver on the digital transformation, which is a very common thing these days. What are the three or four things that I would need to know about managing data engineers? What what what’s what what what what’s it like in the mind of a data engineer?
I I think, you know, so if I’m in a data engineering position and I see that my company has hired a new CDO, and maybe that CDO is, you know, you know, in in some sort of skip level way, my boss. Yep. Right? I mean, I’ll I’ll tell you the things that go through my mind is, okay. So What’s gonna change for me, sort of the whomever of my cheese moment? What sort of projects are we gonna be working on? Does this person have it?
You know, the, I wouldn’t just say the technical acumen to understand. You know, what I would do as a data engineer, but also, you know, can they talk to us? Can they relate to us in a way that we, you know, like to be related to. Right? Do they understand what we do? You know, are they gonna take the time to get to know us?
Or are they gonna set us up, you know, set us up for success as as well as, you know, the project successful. I think those are the things that immediately come to mind. And then obviously, you know, the big question marks is, you know, if you know a thing or two about CDOs, the the life span of a CD EO is I would say it’s remarkably short. What is it?
Eighteen months or something like that? It depends on who you ask anywhere from eighteen months to, you know, thirty ish but it is typically half that of a CIO. Yeah. So, I mean, not long.
Right? I think part of that is the immaturity of the field and the immaturity of expectations around this position, but now that it’s mandated, I don’t expect the immaturity to go away. It’s more like just some federal agencies and and state by the way. But we we have to have you.
Therefore, it’s like, great.
And so that’s that’s what I’d be looking at. As a data engineer, I’m like, okay. So if I if I know that the life span of a CDO is typically x, we’ll call x, it’s moving target, but or or t, I guess, for time.
Is that variable, but, okay. So it’s t.
You know, are you gonna be around enough to actually impact anything? Or do I just need to wait you out?
Woof. That’s harsh. Well, it depends. Right? So that that that comes into, do I need to get, you know, or are you gonna be there for us, you know, as it’s a team? I’ve seen this happen where, leaders get in and, and, you know, the team is is sort of like, well, I don’t know. It depends.
You know, if you if you have our back, we’ll have yours. But, you know, because it but the waiting out part, it’s like, do you do you have the the support of the executives that hired you to to make you or in the case of the federal agencies, you you mentioned, are you hired because you have to fill a position? And if you’re hired to fill a position, does that that that does not mean that you necessarily have the support of somebody that just made you had to fill a quota. Right?
That’s not the same as, like, hey, I support you a hundred percent. Role. Those are two different things. And so I’d be looking at those things. Right? Engineers tend to be very skeptical, you know, which is why it goes in the way out part.
This is typically the I I would say the mentality of engineers is very much, they’re skeptical. They’re probably somewhat cynical.
Right? And, They they wanna do engineering things. They wanna build to do their jobs. And so I would say, like, get to know the, teams that you’re working with, data engineering being among those teams, not the only team, but I understand that data engineering is definitely the glue that holds, you know, the, data together.
It makes it it’s what sets the foundation to make it work. It’s what makes data scientists and analysts successful and so forth. So you really need to view things as a, as a CTO, a very, I’m sorry. It’s a very cohesive, cohesive team.
And, you know, as the old saying goes, respect gets respect. And so that’s how we view it.
Well, you touched on a few things that are in my experience. Having having led engineering teams in the past, completely bang on. Right?
Generally fairly cynical.
And that’s not necessarily a bad thing.
I think that can actually be for an engineer. That can be a superpower.
Right? If you’re if you’re quest if you’re questioning the status quo, if you are if you’re questioning or if you’re doubtful, those can actually be really good things when it comes to, you know, coding and and and and actually engineering processes. So that’s not necessarily a bad thing, but in my experience, engineers are gonna see through, like, fake technical acumen pretty quickly. Oh, you agree.
In about five milliseconds. Yeah. Right. I mean, they have a it’s it’s weird. Like, engineers, they almost have a heat seeking missile for, like, spotting people that aren’t one of them.
Right? And and part of this is, you know, because it’s you gotta kinda understand the the the persona of an engineer. You you know, you’re very nerdy. You know, you’re you’re typically very smart, very sharp.
And, and it’s it’s it’s kind of its own competitive game in some ways. Right? I mean, it’s it’s nerds like to out nerd each other, and that’s It’s just kinda how it is. You know, just so you just gotta you gotta know that upfront.
But the sizing you up, whether you like it or not I expect that to happen. Well said, size sizing you up. So in my experience, non technical people can be highly successful leading technical people if you let them do their jobs. Right?
Like, when I got put into a technical leadership role, I I I I was forced to do the unfrozen caveman lawyer. That’s what I called it. Right? It’s like, I don’t know anything about your Java, and I don’t know anything about your Python, but I do know that we have these constraints in this amount of budget in amount of time, tell me what the best way is to do things.
And for me, that seemed to resonate because to me engineers are problem solvers. And if you give them the freedom to solve problem. That’s half of their their joy. Do you agree?
Oh, yeah. For sure. I mean, it’s the old old, you know, trope. Right? They don’t don’t tell me, you know, how to do something or tell me what to do and figure out how to get it done.
Right? And so so I think you need to be very crystal clear on your requirements. In fact, they’re just looking at some posts on a LinkedIn before we started, it’s funny, you know, especially in data engineering circles. It’s it’s funny that things kinda go in cycles, but, like, now all of a sudden, everyone’s talking about business requirements.
You know, ended, all that fun stuff. But it worked for a while?
I would say it’s mostly ignored. I mean, most data engineering speak was always always about tech tools. Right? Yeah.
I mean, Yeah. But, you know, it it’s it’s just one of these things where I think as you sort of mature, you you get to realize it’s the tools are in technology at all table Right? But this also means, like, as a CTO setting the agenda, you have to be very crystal clear or or a manager, right, if you’re a non tactical manager, managing technical people, you need to be incredibly crystal clear on, you know, what you want. Again, not how to do it.
That’s leave that up to, the engineers and technical stakeholders to figure out, but you need to be crystal clear on what you want. I would say leave no ambiguity on the table in terms of the what. This there’s there’s nothing that will drive engineers crazier, more quickly than giving them, giving them very, ambiguous, you know, requirements, and so forth. Cont trajectory for short.
Yeah. Then that that’s a way to you know, get people to rage quit or something. So, definitely, like, you have to know what you want. You have to be able to articulate it very, very clearly.
I would say both written know, in in in a conversation. And if you both have questions, you should you shouldn’t backtrack.
Don’t go switching the goal posts on people. I would say also define what success looks like up front, whatever that is, you know, both for the project. And I would say just for the greater mission in terms of, like, why you were hired as a CTO to begin with. Right?
Like, what’s what’s your success criteria? I would make that clear to people so that they’re not just sitting like, well, what’s this person’s agenda? Like, why are they here? Mean, how you know, you’ve had bosses before, and sometimes you you’re probably curious, like, what does this person do all day?
Why are they here?
What are they incentivized if you if you understand that, then you have a better picture of things. When you don’t understand it, then I think it it starts leading to, you know, animosity and, and, confusion and and that sort of thing, which you, which you don’t I mean, everyone should just I think people don’t need to get along, they need to they need to go along, you know, and get the get the job done. So So clarity of requirements, clarity of the role, transparency in the role, right? Having some idea of what you want to accomplish.
In my experience, when there’s ambiguity or or for lack of a better word weakness, there in terms of of of where we’re going, that that that is a to me, that’s like a a credibility foul that can do a lot of damage with engineers. You you need to, to me, you and you expressed this. We we were talking about, you know, lack of ambiguity and clarity. When it comes to where we’re going and the role and the success, that’s where you need to be really, really clear. And if you’re not, that’s where that’s where things can go sideways pretty quickly.
At least at at least in my experience.
I’ve I’ve I’ve seen engineers who will argue about the hill that they’re climbing.
But as long as their clarity on what the hill is, they will climb it. Right. The grumble a little bit because they don’t agree with with with with the path, but what I’ve what I’ve seen is that if you can make as a business case as a CTO or any leader, If you can make a compelling business case, if you can make a compelling y, if you can connect the dots, right, here’s here’s what I believe, here’s where we’re going, here’s how we’re gonna get there. Here’s why we’re doing what we’re doing.
That’s that’s the secret sauce to get engineers on board. And if there’s if you’re missing any of those steps, then it can become slightly toxic because I I have seen a lot of engineers that are really, really good at best way to describe it. Kind of like networking for for evil.
For for lack of a better word, I mean, I don’t I don’t, you know, but but I I I that is you’ve you’ve got to have your engineers behind you. Oh, yeah. You you have to.
As as far as I’m concerned. And and and you’re not gonna do that by by feigning any sort of technical acumen. If you don’t have any, the best thing you can do is say, I don’t have any.
Yeah.
That’s exactly it. And well, let me add to this. Go ahead. Well, because the other the other thing I would add is that I wanna make sure that you have, as a CDO, you’re developing relationships with, the stakeholders that the teams will depend on.
Right? So the this is, like, typically, engineering orgs. So software engineering separate from data as it sounds right now. It drives me insane, but that’s how it is.
So but often data teams are dependent upon say the application or dev teams for the data that they get. And if you can’t bridge that gap, I, you know, it’s I I think a lot of the the problems that we have in in data actually stem from the, the dev and data divide that we have Right? Because because as the old saying, you know, goes like crap, slows downhill and you can substitute whatever word you want, but it’s but is it’s it’s off in one direction.
Right?
The dev teams don’t necessarily need to depend on data as it stands in most organizations, but the opposite is is, I think violently clear, like, if if if data doesn’t have a good relationship with dev and it has reliability of getting data from dev, then, you know, data can’t do its job, but all too often I see this is definitely a, you know, very critical, critical path that’s often ignored. So Well, I was was funny. You went there. I that was exactly where I wanted to take the conversation because when I was leading dev teams, like, application development, there was absolutely positively a hard line between our DBAs and the data DBAs, which are now, like, data engineers, like, but DBAs database administrators.
Old old school terminology, but largely for the same thing as people are building the models, managing the databases, and and on and on. But there was absolutely a hard line there And, you know, the the last page of the requirements document was the reporting requirements, like, was pretty much the last page. It was always kind of tagged on to the business requirements document.
And I wanted to pick your brain to see if that’s if that’s still the case, and it sounds like there is still that hard kind of line there between application, you know, data from an application perspective and data from classic data engineering data and analytics perspective. How do we how do we change that? It’s interesting. This morning, I was doing a podcast on the morning morning data chat with, some friends of mine from Estuary. They’re a, a streaming provider, but So I think there’s a couple of ways that this will happen.
Part of it, I think, is one, there’s just a pure, lack of empathy, between different teams, right, data doesn’t know dev, dev doesn’t know data. And in fact, last week, oh, I’ll get okay. Let me finish the the, estuary discussion on the morning data chats. So I think streaming is actually one way that this is gonna be solved because of the forcing function.
And if you live in a batch oriented world, for example, right? You don’t have the you don’t have the time dependencies that would facilitate, I would take quick feedback in a way that would be useful. Right? If if I just need to get they pull data from a database from a dev team.
Right? They typically set up a read replica of the production database and have me read from that. That’s that’s great, but it say that I’m, you know, ingesting events or, you know, incremental data on a continuous basis from an application. Well, that That brings me closer to the dev team because now there’s there’s much more of a synergy to data that we’re we’re we’re we’re they’re they’re providing them more consuming.
And so that’s definitely one way that that I see inevitably will happen. I I think the, the lines between, you know, batch and streaming are gonna be continued to be blurred you know, the next several years. I think streaming is just gonna be a first class citizen. I don’t see any reason why it shouldn’t be.
So that’s definitely one I I technical forcing function where the the sort of the feedback loop, that that’ll facilitate, I think by default, you know, more cooperation, especially say an application team makes schema changes or data changes or stage data quality issues, downstream will need to know about that. Right? So I think that’s that is obviously a forcing function that I I I think will will do a lot of good in the end.
The other part of it though is, you know, as as as so last week was very interesting. I was gave a talk at, the Utah engineering, leadership meetup. So there are thirty people in the room, mostly, you know, some software engineers, but mostly managers, directors, VPs, C level, you know, engineering orgs.
You know, I talked about the dev and date invite there and and they, you know, they’re in agreement. I think the comments that I got were typical. Yeah. We don’t really care. About the data people. Why should we? We have a lot to do.
You know, we’re not getting paid any extra. Our sprints don’t include data people. So should we really care about this?
You know, like, one guy mentioned that when he gets a request from a data person, he just ignores it. He’s like, nice. I was like, well, I they well, they because you said they’re not clear on what they want from me. Right?
And if you can’t be clear on what you want from me, then why why why why why should I take the time out of my very busy schedule. We’re at no one’s center to help you and help you out. So, you know, the default is again just throw up the read replica database and have people just read from that. So that was a that was a very interesting conversation.
I think it at least got the conversation, going with dev people who just, you know, they they weren’t aware of data people, but they deal with data people all the time. A lot of these companies that they work out, I I would say are, quote, you know, they definitely have data, data products and so forth, but the application developers, it’s out of sight out of mind. But then two days later, I go to Atlanta. This is very interesting.
So I, you know, there’s the, Joe Reason DBT roadshow that I’ll I’ll do with a DBT.
And so we did this meetup in Atlanta, and it was all data engineers, analytics engineers, and so forth. And that that was an interesting one. So I also brought the dev and data divide.
And, I would say same sentiments, but on the opposite end of the spectrum now, whereas, you know, you’re on the receiving end of data, you’re not making it. You’re relying on dev teams. And there it was definitely a a feeling of some people definitely had great relationships with their dev teams.
Others definitely had, you know, a kind of a sense of of existential dread because they’re wholly dependent upon these dev teams that frankly didn’t care about them. Right? So, you know, that’s and I think this is part of, you know, if you if you look at the symptoms of of things like we talk a lot about data governance, data quality, and all this fun stuff. Right?
I think we’re all too often focused on the data part of stuff, and we ignore the bigger picture of where it comes from. And who’s providing it and maybe working with the upstream, you know, stakeholders and addressing these root causes. Again, all too often, I see that we’re just we’re on our own little bubble. We’re like, oh, what what can we do to patch over the data and, you know, and and make it useful for BI.
And I and again, I think increasingly, though, you know, to bring it back to the first point with with, streaming and and more tighter coupling with teams, you know, the the the sooner you’re aware of problems with your data, the sooner you’re able to address them. Right? And I I think that that’s it it’s been a necessity gonna change, a few things. And and you mix it in with, you know, the introduction of, you know, every company wants to do large language models and AI now and So I think that’s another lens or, you know, another, you know, way of of seeing data issue is more clearly.
I think we’ve talked about this Malcolm, but it definitely feels like, you know, if there’s ever a time that we need to get data right with its quality governance, you know, or just broader data management. It’s probably now. Like, we can’t get it right now when there’s, like, a ton of attention to AI. I I have no idea when gonna get this right.
Like, this is this is the time. If you’re ever gonna get it done as an industry, like, I don’t know what the hell we’re waiting for, and I’m not sure you know, there there’s only so many AI winters and machine learning hype cycles you can go through before people kinda get tired of it. We’ve done through a few of these. I really feel genuinely like this is the time when we should be getting this right and, like, actually take this seriously.
I I agree. And I’ve actually kinda said this publicly a number of times. I think that that CDOs are at a bit of an inflection point, but it’s not don’t assume that it can inflect up. It could actually inflect down. It could go either it could go either way up, up or down based on us finally maybe figuring out some of this stuff.
And I’ve had a lot of conversations recently online, you know, about some of these topics that we’ve been talking about literally for twenty years. Right? Like, like, how to how to address governance and quality, you know, how to link our our data related efforts to actual business outcomes and on and on and on. And I couldn’t agree more. I think now is the time I do think getting back to your conversation about kind of the the dev and data divide.
That’s something we absolutely need to fix because I because I think it it strikes at the foundation of the culture of both of these organizations. And I think right now, things are a little dysfunctional, but it’s a two way street. It’s too kind. Well, perhaps, but but it’s definitely a two way street. Right? Like, as as much as it stings to hear somebody in the dev side say I don’t care about data analytics.
Right?
On our side, on the data side, we do a lot of finger waving. Oh, yeah. Right? Like, we do a lot of finger wave.
I think all that silly business. Right? They’ll never get the data quality. Right? Right. And and there is no shortage of sound bites about how crappy the data is.
Right? When in reality, I’ve got a working theory that A lot of what we call data quality or poor data quality is just a different use case.
Meaning data that is that is that is optimized for operational use but not optimized for an analytical use. And the divide is all sorts of transformations, all sorts of munching, all sorts of whinging and complaining about low quality data that seems to be perfectly fine for the business process of supporting. Often. Yep. Exactly.
So how do we bring those two worlds together? Because, you know, how do we get the business side as it were? I would I would say our customers, how do we get them to kind of know that what they do has downstream impacts without being finger wavy, right, being supportive and enabling, and helpful, And on the on the data side of the house, how do we stop basically blaming the business for all of our woes?
I think as a profession, we really need to, you know, for lack of a better way, putting it pull our heads out of our rear ends. And, It’s incumbent upon us, I would say, to to, start helping bridge the divide. I think you’re exactly right. All too often we’re very, You know, we we like to plan our fingers to everyone else but ourselves. And I think, you know, if if you’ve been having the same discussion, if we’ve been having the same discussion as an industry for twenty plus years Yep.
Which, you know, you’ve been there. I I I’ve been around that long too, and it’s it’s it gets exhausting after a bit. You know, I I’ve been I’ve been having thoughts about this industry. And part of me is like, is it what are we doing at the end of the day? You know?
We keep we keep talking about value.
Business value. It drives me insane.
I jump off a bridge next to somebody, talk talks about business value. I’m just I’m tired of the conversation because it’s it’s it’s You know, it’s it’s just it’s it’s like it’s like, I don’t know, being at a crazy house, people just keep talking about the same stuff over and over every single day, and and we’re not getting anywhere. I think it’s incumbent upon us to start thinking of different ways to do this and start acting on it. Right?
I I I think, like, the the it’s it’s obvious that the ways you’ve been doing it haven’t been working. Right? I mean, you brought up a post the other day about, you know, large language models and small language models and introducing those for for data governance. So, like, at this point, I’m like, screw it.
If this is what it takes, It’s a hail Mary pass. I because I can tell you the stuff we’ve been trying to do for tated governance of the past twenty, thirty years ain’t working. I keep seeing the same arguments over and over again. And I don’t know if if these were people talking, which they are, I guess, you you would probably call them crazy or maybe, you know, in need of, some help.
So it’s a, you know, part of it I definitely think to bring it back is is to bridge that divide.
At a minimum, I don’t think it requires high-tech stuff. I think it’s simple. It’s like doing lunch and learn to the dev team and saying, here’s what we’re working on. Here’s ways you can help us out.
You know, and this is how it affects, you know, our relationship, which we wanna improve, here’s how it affects the broader business. Here’s how it affects you. You know, the reality is, in in a lot of cases, you know, data applications are becoming data products. Right?
And and applications, software applications are becoming data products. And so what that means is you know, data is becoming more and more of a first class citizen. So the feedback loop between application and dev isn’t a one way street anymore. It’s it’s going you know, in a harmonious cycle.
So I think the sooner you can start building empathy and and a good relationship with the dev team, that that goes a ton. Great long ways to for solving things and and show them that, you know, the, you know, you help us out. Hey, there’s a feedback loop now between us and and you, and we’ll help you out. And this is on the same team at the end of the day.
There isn’t a a devon and data. And I I so I think that’s, you know, just, you know, that’s how I think individuals can impact Again, lunch and learns, I found were, like, some of the best, most powerful things you could do. Just buy a bunch of people, show them what you’re doing, you know, they can show you what they’re doing and It just goes a long ways.
You know, just communication, it doesn’t doesn’t cost you much. Like, you know, being being nice doesn’t cost you anything. It just just need to need to be friendly to Right? And so Right.
Well, it’s it’s so much harder to to to, you know, throw stones when when you know somebody, right, when you when you’ve had lunch with them or maybe you’ve gone to the hours of your nose, right? It’s so much harder to be hyper critical or or malevolent, if if the friends of yours, right? So you couldn’t agree more. I do see kind of high level a a a void of what I think is strong leadership.
And we and we absolutely positively need stronger leadership. Right? Like, we we need to stop the finger waving. We need to be more customer driven. You’d mentioned data products.
That one to me could could go either way speaking of inflection points. Right? It could actually have a negative impact when all you do as a data leader is slap a new label on something. Absolutely.
Right? Like, because because our our customers are gonna see right through that. It’s like, oh, here we go again with another data mart or another hadoop.
Right? Like, I’ve seen this I’ve seen this play out before, and this is just another big data moment. You’ve slapped it new new label on something, and nothing is fun changed. Okay.
You’ve hired somebody who used to be a data steward and, or business analyst and now has the title of data product owner. Yeah. But Right? Like, what’s different?
Right? That’s the inflecting down because our our customers are going to question us and they’re gonna think that this is just some sort of, you know, spin cycle and we’re just chasing the hype.
Or to me, it could actually inflect up if we start actually practicing product management because that’s to me. That’s that’s the goodness. What do you think? Yeah.
I totally agree. Yeah. I mean, at the end of the day, I I I think you hit on something really important, which is, you know, you know, you know, as Peter Truck always talks about, right, the the purpose of a a business is to serve a customer. Right?
All too often that it’s not what we do on our teams. It’s it’s to, you know, you haven’t, you know, other customers and, you know, internally and so forth. But the customer that really matters at the end of the day is the customer, the one that’s paying for your business.
So think you’re absolutely right. If we can get back to basics on that, and I think it goes a long way. Because that’s because that focus you to be be customer centric. And I think if you if you can there’s, you know, the technique of value stream mapping.
Right? So focusing on the end cut, the external customer and what they want, how you how you serve them, right, and the, the processes and information flows that service that customers needs. You know, that’s those sorts of practices that, you know, I totally agree would go a long ways, not just practices, but but a culture of of thinking about a customer. So I’ll give you a really good example as a Chick fil A on, Friday, not the restaurant, but at HQ.
Oh, okay. Cool. Yeah. I mean, I do go the rest of your life. Yeah. I I I gotta tell you.
I I I I’ve heard things about Chick fil A’s culture, right, good things. Like, it was rock solid.
You know, I’ve met people from there before. You go to a restaurant, right? And the staff is just it’s a different experience. People are just like, oh, it was my pleasure to give you fifty packets of ketchup and all that kind of stuff.
You know, and, but when I was there, you know, at first, I thought I was like, it’s gotta be a ruse. You should warrant this into the collection here. Nope. Absolutely. They they are absolutely brought in.
And It’s a very customer centric viewpoint. It’s like at the end of the day, what matters is making the customer happy. Right?
I think they nailed it. You know, it’s a culture that I would say for what they what they’re doing, it works. It’s worked for a long time, you know, true at Kathy, the, the founder the company. I think you did a good job at instilling the culture.
But to me, that’s like exemplary of a of a a strong culture where everyone from, like, top to bottom knows what why they’re there and what they’re supposed to do. That’s it. Yeah. I I think it ultimately is about customers and you you you mentioned value mapping and understanding how data kinda weaves itself through all of the business processes and and how those dots are connected.
I think that that’s important.
I won’t go off on too much of a tangent here, but but I would argue that if we are more business literate, we would be asking our customers to be we would ask them less to be more data literate. I I think if you have well designed products that are that are intuitive and easy to use and meet a need where you’ve shown a a connection between what you do and the value provided to internal customers and external customers, then there’s gonna be less of this focus on what could otherwise be called data literacy.
Because, Frank, frankly, when you’re focused on data literacy or when you’re focused on being data driven, I I I think that those are code words for putting data first when we should be putting customers first because frankly, that engineer that didn’t care about data, on the business side of the house. That’s a that’s that’s a very, very common, very common perspective.
And there’s there’s some problems to that.
But I I think if if we have well designed products, that are that and we are maniacally focused on customers and and building things that are easy to use. I think a lot of that other stuff just kinda naturally flows away. We and we can become our own Chick fil A’s as it were. No.
Absolutely. Yeah. I think business literacy and and and, like, customer literacy, right, or, or necessary for sure. I think the prerequisite being a literature is good.
You know, I I’m I’m I think Jordan Morrow is gonna be, stopping my house for a barbecue on Friday. So he’s he’s he’s was supposed to connect with them on Friday. Something came up. We had a meeting set up on Friday afternoon to connect because because I’m I’m critical to data literacy.
I I really I I I really am, but but he’s such a wonderful human being Yeah. That, like, I want I reached out and I said, hey, listen, I’m critical here, but I want you to I’m critical, and I want you to know where I’m coming from. And I want you to see my perspective here because I it it the exact words that I used, I said it it it brings me no passion or it brings me no joy to be so critical of something that I know you’re so passionate about. Yeah.
Yeah. And I think it’s it’s I think everything has its place. Right? I mean, he, you know, data literacy is, I think it’s it’s it’s good.
I I I For my perspective, I I would say, like, if you can have the basic thing is just know why you’re there in the first place. Right? So that that goes back to the customer, then building the literacy around the customer, then business literacy understand how, you know, different, you know, functions of the business work to serve that but but data is, I think, data literacy has its place, I think, and then why where it has its place is as you become more data driven, it, you know, another word trigger you to jump off a bridge. The but it’s I think it does have a place, but it’s only, I think, when you built the foundation of of serving a customer.
Right? And then from there, you can you can leverage that experience better with data. But I don’t think the other I don’t think the opposite is true or if you start with data, everything magically falls into place because you don’t have a true north and you don’t have your why.
You know, what? I mean, some of the best foreign organizations, I mean, for god’s sake, I mean, John d rock heller back in the day. How how much money that guy have at his peak? I think you still maybe argue adjust for inflation probably one of the richest the richest person in the world.
I don’t recall him having a data driven data science team.
You know, I don’t think that he, He was, you know, didn’t have giant cloud data centers and stuff. I don’t think he had a CTO. Right?
He also was very monopolistic and, you know, think there are a lot of antitrust laws that came about as a result of him, but, but the whole point is he he had a sense of what people wanted and they wanted oil. Right? And he had a sense of what he wanted, which is he wanted to control that oil. So so that’s that’s one example, but it wasn’t a data driven you know, you couldn’t say that he started out from a data driven perspective saying, oh, well, we have to be data driven to do this.
It’s not quite the same. I do think it has its place, but it’s only when you I think you built the foundation and sort of the muscle memory of serving a customer, you know. Oh, Toyota is a really good example of this too. Right?
It’s like lean, for example. Right? They came about as a way of, you know, making a streamlining operations at Toyota. Right?
But lean is very much a very, like, low tech thing. It just relies on people on the shop floor to understand what’s going on. That’s that’s about it. Like, and to remove defects doesn’t happen, but it wasn’t this, you know, this grand, data munching exercise, far from it, you had, I don’t know, a kanban board visible from everybody.
You have a cord, and unquote it’ll only stop a production line if something happens. Right? Like, you know, it’s you can apply data after that, and I think it’s where, you know, things like six sigma and you know, other, you know, quality control things came into effect, but at the end of the day, that I’ve that was just that was super low fidelity. Right?
Doesn’t you don’t need to be fancy.
So Yeah. The the way the way I kind of look at it, the things that you were describing before around this kind of this idea of, like, the customer driven and versus data driven the way I look at it is kinda bottom up versus top down. Right? And if you had to pick, you should be top down.
The top is the customer. And and and where are we going? That’s the north your exact words where your north star. That’s that’s your north star.
What does the customer want? What does the customer want a customer want a customer? And I can be maniacally focused on that. But what I see is so many of us in the data world go from the bottom up.
Right? Like, we will go catalog everything, we’ll glossary everything, we’ll do lineage for everything, we’ll understand our pipelines intimately at a at a minute level, like the lowest kind of atomic level.
And then from there, we’ll we’ll try to take the next level up and we’ll try to build a product on top of that. Right. And that could be problematic because there’s so many possible skews in this product. Is it is it a field?
Is it a record? Or is it what is it? And that bottoms up things can get really complicated really, really quickly because you don’t have a north star. All governance becomes equally important all all data becomes equally important.
All systems become equally important. But if you take the top down, you can cut away all of that other stuff and just focus on the one thing. Like what is that one thing? And maybe it’s just one report or one master data domain or or something else.
So couldn’t couldn’t, couldn’t agree more. Alright. In in our last little time that we we have together, you’re a data guy. You’re you’re recovering data scientist.
Obviously, AI is at the front of everybody’s mind.
I’m a new CTO.
And maybe I actually don’t have a data science team or a data scientist. I’ve got I’ve got a few data engineers, and I’ve I’ve got, you know, I’ve I’ve got a reasonably sized team may maybe fifteen people or so, but I don’t have a dedicated data science function. Do I need one?
The trick answer is yes.
So, but I don’t think that’s the correct answer. But it’s the answer. I think people will gravitate towards, because it gives you the appearance of doing something.
So And you might have a budget that you need to spend too. So there’s that. But, you know, I I if I’m a new CTO and I never been a CTO, so I’m only speaking from hypotheticals here, but I would, you know, I would definitely take the time to read the room. You understand, again, what’s what’s the cup where’s the company at?
You know, in terms of the I would see the capability of doing data science, for example. Right? I mean, I I can’t tell you how many, executives I’ve seen who have, you know, been walked in, said they’re gonna do really cool stuff with a data, you know, AI and yada yada yada. And and they’re gone.
They’re gone soon. You know, because they they it’s hard to deliver on it. Right? Or if it is or if they embark on this, maybe it’s going to wrong thing.
And so I would say, like, y’all read the room, understand, like, what, what were you hired to do?
You know, what and not just by the immediate boss, but, like, you know, talk to you know, I’m assuming the CEO, if I hired you, maybe the board made a decision, but really, I would have a even before I’m hired, I would have candid chats with them. I understand. Okay. So what are you trying to do with this? What’s why is this gonna move the needle for you? I don’t think every company has to do AI.
I think, you know, it’s In fact, I would say it’s maybe the hardest thing from, like, what other companies should be doing. But, yeah, what are you gonna ask? Well, I was getting I’m I so I kinda struggle with this question myself. Right? Like, because I’m not entirely sure where things are are going from a kind of a meta perspective, right? Like, half of me thinks that models will become commoditized.
Certainly large language models will will increasingly become commoditized and you got Bing and barred in open AI and maybe you’ll have a handful of others.
But will one really, really be that different from another?
I’m not I’m not sure.
So that’s that’s half of me that would say, okay, off the shelf solutions, maybe I can use that. Right? Maybe I don’t need to go hire a a custom data scientist team or build a data science team. And maybe, maybe I just need a few really good engineers and some people who kind of understand this stuff, but I don’t need to go pay four hundred grand for data scientist. That’s happening. The other half of me says, well, where where we could be going is, you know, dedicated models, new models, smaller models, that don’t cost zillions of dollars to build that are trained off highly bespoke data sets.
And one of those may be mine. Like my corporate data set, and maybe I do need a data scientist to do that. What what do you think? Well, so it goes back to our, you know, one part of our book, we’re we’re giving a talk on Saturday with a group of, people doing a book clip on it.
But we talk about, in the context of data engineering, we call them type a and type b data engineers. So type a stands for abstraction, type b stands for build. Build is where you start customizing and and, building to your core competencies, you know, that sorts of business function. And I would say you need to, again, go back and read the room and kinda where you are as a company first.
Like, do you have, you know, is it is it required that I need to build, or am I fine doing a type a, which is just buying good services, using those.
You know, and then so you just need to understand the use case of, you know, if I am I would first understand, like, what AI is and what machine learning is and, like, all the all the basics. Like, you know, to me, the Sable six, if you’re gonna be a CDOs, you should at least have a a bare, you know, a, a baseline of competency and understanding, like, difference between these things. If you don’t have that, then god help you.
But, you know, but that being said, you know, evaluate. What are the trade offs? If you if you decide to go down the the bespoke route and, customized models, like, what’s it gonna what it what do you get at the end of the day? Right?
Like, what does that how does that reflect against the band aid, that you were probably given? I would go from there. I mean, there’s not there’s not a right answer. I think the the, you know, high data scientists are expensive as you say.
I would do everything in my power to avoid hiring people.
You know, or but when I do, I’m gonna hire the right ones. Know, hopefully, but you think you really do need to read the room and understand, like, what where where are you at as a company?
So I think it just that’s kind of the bottom line. There isn’t there I don’t think there is a standard answer for this. I think, like I said, the tricky the tricky answer would be to, you know, you definitely need to hire a full data team, but I think through the especially through the twenty, you know, the early twenty twenties, we saw how that went when money was falling from the sky. And data teams, you could spin those up real quick and exorbitant salaries.
And now a lot of those data teams are being let go. Right. They weren’t they weren’t able to provide the, the value and deliver on the mandate that they were probably expected to do, you know, or they or it was, you know, anyway, I think the dirty secret is a lot some data teams are hired because it provides the optics that you’re doing data stuff. And that might help you raise another round of funding.
It might help you, get a particular customer, you know, so there’s I’ve seen this happen, personally. And it so there are other reasons why you might wanna have a data team. So I think you also need to, I can read the room and understand. Is that the reason you need a data team?
It may not be to do data stuff at all.
Well, and we come right back to something that we talk about a lot and I talk about a lot on LinkedIn and and other place is having some idea of a data strategy. Yep. Right. And and what when you say read the room, right? Like you gotta know your operating model, you’ve gotta know your capabilities, where are you strong, where are you weak, where are you trying to go? What’s that north star? Now this doesn’t require, you know, a nine month long extension engagement to go and build the data strategy.
Right? Which which is what I actually see often. I used to see this at Gartner all the time. Six to nine month, you know, long drawn out engagement these kind of maturity assessments and and these these strategy engagements that took forever and ever, but still having an idea of what what success looks like, what’s the north star and kind of some of your operating model, like where are you strong and where are you weak, doesn’t doesn’t necessarily mean you need to spend six months on a maturity assessment because spoiler earlier, you probably two point five out of five across the board.
Right? Like, it’s just always the same. Right? It’s like, yeah, we suck at governments, and we we suck at data quality, and we’re we’re marginally okay at at at at BI, and we’re marginally okay to a few other things.
And, yeah, nonetheless, no, it can be a lot of insights. So alright. Great advice as as as always, enjoyed our chat. You’d mentioned the, the Monday morning data chat.
I have been a guest in the past. I absolutely positively recommend tuning into that. It’s ten o’clock mountain.
That’s nine AM mountain time. Nine nine mountain. Okay. So eleven eastern, the Monday morning data chat on LinkedIn You’ve got your book. You’re on the road. You’re off to where next.
So this weekend, I go to Australia for a couple weeks.
Going on the grand tour data edge bytes is a data engineering conference in Australia. So hitting Nice. Perth, Malborne, Brisbane, and Sydney. And after that, yeah, lots of trips. I think this entire fall is booked.
You’re a big day to London. Right? Oh, yeah. Keynoting big data London too. Yep. Doing that.
Cool. Okay. Yeah. Yeah. I think they did a panel last year. Didn’t they with Zamark and a few others?
That was the one. You know, some not keynote. And I I think as far as I know, I’m doing whatever she did. So, Oh, alright.
Yeah. Big shoes to fill there, sir. Big shoes to fill. She’s actually got smaller shoes than me.
I’m just kidding. So she’s she’s a really good friend of mine. So, yeah, we’re actually catching up, pretty soon here. But, Yeah.
It’s it’s gonna be fun. I think, you know, just the it’s it’s interesting. I think what’s the interest in data engineering and speaking requests, you know, I’m working on a new working on a new book on data modeling too. So that’s the thing kept getting a lot of attention right now.
Is it black? Is it not? It is. Yeah. It’s back. Right? Back with the vengeance.
You know, and and so, you know, stay tuned on that. I mean, the book should have been done by now, but I I had a couple of course courses that popped up that I needed to get out of the, or still working on. So you can’t so so can you automate that? I I thought we automated data modeling.
Didn’t we figure that out already? No? Oh, yeah. Yeah. We yeah. In fact, yeah. We don’t need it anymore at any point.
I’m being glib, of course. No. I mean, all your all your data problems will be solved now. Just by, robots.
I mean, that would be kinda cool. Right? If that could happen, I, you know, maybe at some point it will. I think it’s but it it’s interesting.
You know, data modeling is one of these, It’s the call to arms for me was after I finished, fundamentals of data engineering.
You know, as I started talking to to data engineers and software engineers and forth. Data modeling, I think, across the board is just one of these practices that has a lot of necessity and is mysteriously gone from the lexicon and and and, practices and and the quiver of of people who touch data, I would see unanimously. You know, if you if you talk to software in here. So you’re asking about data modeling.
It’s it falls in deaf ears. You’re like, I don’t know. We why would I do that? You know, even stuff like understanding you know, the the basic normal forms.
That’s, you know, largely disappeared as far as I can tell.
You know, and then from a data modeling perspective for analytics, and just, I’m just touching on things that have already existed for a long time relational data models and then, you know, meet dimensional data models and, who lesser extent data evolved, but practices even like Kimball, this is being rediscovered, like, for the fur it’s like people just rediscovered fire or something. It’s like, oh, star schemas. Have you heard of these before? It’s like, yeah, it’s been around for a long time.
Right? It’s not because relational databases just kinda got pushed to the side for a while. But they’re still widely used mean, they’re you know, Amazon Aurora is still like they say it’s their biggest database ever and most widely used, but it’s like I think part of it is know, there’s a pendulum between formal and fast. Right?
And and so I think the the pendulum swung to fast. They’re definitely during the twenty tens. You, you know, had no sequel databases, which aren’t, you know, relational databases. I think that sort of blew up the door to say, well, why do we need to model things?
And then, you know, the use of object relational mappers and software engineering, you know, which is an abstraction layer, which you you can write, basically, your object oriented code can, right, SQL for you. I think that that’s great. And it makes you, fast, but it also means I don’t know if you ever looked at the types of stuff that arms generate, but it’s it’s interesting sequel. Most of it’s pretty good.
But but it also allows you to flexibility, I would say, to, come up with a data model probably isn’t that coherent at all. It’s barely first normal form. And and so you you start getting into these these vaccinations where I think, you know, we have all the tools in the world that that can make us, faster. But I I would say the practice isn’t knowing how to use these tools is is, you know, somewhat falling by the wayside especially when it comes to databases.
And And it’s not just databases though. When you when you go up to the the other, you know, considerations of conceptual and logical data modeling, which eventually should go to data modeling. I mean, conceptual and logical, these are terms that that people don’t hear about anymore. And then, you know, that we just jump straight to physical.
So I think you know, across the board, really. It’s it’s, I think more symptomatic that data modeling just seems a bit of a revisit. But but also it needs to be I think the approach I’m taking with this book is looking at data modeling across the life you know, data, whether it’s, where where it’s created, where it’s used in applications, where it’s used in analytics, but also machine learning streams and different, you know, ways along there, you know, as well as, I think, offering new techniques, that haven’t been covered, you know, and, I mean, the If all we’re doing is referring back to Kimbell at this point in relational modeling, I think it says about a lot about the state of the industry that we haven’t evolved that much.
Mhmm. Right? Practices have certainly evolved, but I would say our our our framework for thinking about data and the way we model it haven’t caught up at all. We’ve actually regressed.
I think you talked about downward inflection points. Yep. That’s one of them.
Well, yeah. Thanks to the pendulum. Right? I mean, which exists in everything we do in business, and we, and we love to swing really, really wide to one side and then to the other side.
We, and, and, in the process, we forget things. So Anyway, I shall, let you get to your world travels, and so you can finish off your book. Joe, thank you so much for time. Anytime, Malcolm.
Good challenge. We really, really appreciate it to our audience. Thank you for turning into this. The thirty third episode of the CDO Matters podcast.
Yeah. Thirty three. It’s kinda crazy. If you like what you heard today, please take the moment to, subscribe to check us out in the next episode.
By many, many thanks to all of our, all of our listeners out there. Thanks again, Joe, and we’ll see you again sometime soon.